Things I have learned about XHTML

An entry published by James Bennett on June 21, 2008, Part of the categories Pedantics and Web standards. 28 comments posted.

The following are gleaned from the comments to my recent explanation of why I chose to use HTML 4.01 Strict for my redesign, rather than a flavor of XHTML, an explanation in which I mostly boiled the debate — for my needs, here on this site — down to “XHTML doesn’t offer me any compelling advantage, and it’s more complex to do right than most people know/admit”.

Advance warning: yes, this is snarky and is going to make fun of uninformed comments. Yes, I do think it’s necessary to call people out on this kind of thing. Yes, if you don’t like it you should go read something else. So let’s get started.

Craig says that

I don’t really agree that XHTML is any more complex than HTML. If anything, there are fewer tags/attributes.

Since XHTML 1.0 was a tag-for-tag and attribute-for-attribute identical reformulation of HTML 4.01 into XML, I have a tough time understanding this one.

T. Bille chimed in and contributed the fact that

Something left out of the picture imho is the compliance with disability standards which usualy imply stricter checking / xhtml compliance.

For reference, here are the relevant accessibility specs: WCAG 1.0 and WCAG 2.0. Try out your browser’s in-page search, and you’ll find that a certain sequence of characters — to wit, “XHTML” — is conspicuously absent from the contents of both documents. Irony of ironies: WCAG 1.0 is an HTML 4 document.

Next up is Tedel, who mentions

However, I have also found good advantages of using XHTML 1.0 strict over HTML 4.1 strict, especially in search engine optimization techniques.

Since XHTML 1.0 Strict and HTML 4.01 Strict are, again, identical in terms of tags and attributes, I’m somewhat amazed by this.

Then there’s Timbo with an urgent plea:

Please use XHTML. It’s so much easier to scrape your data with.

Sorry, Timbo, I already use the most advanced markup-scraping tool on the planet, and so should you.

Don Ulrich contributed several gems to the conversation, but this one was my favorite:

If ppl only read the w3c spec they could understand how robust (X)HTML is. Most only use a fraction of its resources. We have created many a false social meme about markup.

Indeed. XHTML is so robust that, for example, a document that’s invalid XHTML will be rendered correctly by your web browser even when served as application/xhtml+xml. However, it’s also so fragile that a document’s well-formedness status can change based only on the details of the transport protocol used to get it in front of your eyeballs.

Although I do have to give an honorable mention to his most recent contribution:

And lastly XHTML is an application. Where HTML is markup.

I could go on like this for a while, and I probably should have expected that my article would bring some uninformed kooks out of the woodwork, but seriously? People? It’s 2008 here and the necessary information to clear up all of the above confusions is publicly available and has been for years. If you’re a professional web designer or a professional web developer and you can’t spot the problems with these comments, then I weep for the future of our industry.

On June 21, 2008, Jonathan Snook said:

I think Timbo has a minor point in that there aren’t (I assume) as many HTML parsers as XHTML parsers. And it’s not about scraping your own data, it’s about letting others scrape it. With that said, your site offers up a fine XML-based API via….RSS! I can’t imagine what Timbo is looking to scrape that isn’t neatly covered by an RSS feed.

I think you’ve unfairly slighted Don Ulrich, though. By use of parentheses around the X, he’s indicating that his statement applies to HTML and XHTML. It seems more a call for people to pay more attention to the spec than the hype. (my 2c anyhoo)

On June 21, 2008, Mark said:

Must… resist… fist of death.

On June 21, 2008, Marcus Cavanaugh said:

I think you’re right, James, that for all practical purposes HTML is just as good (if not better) than XHTML.

HTML’s nature is much more human-oriented; our minds aren’t robotic calculators, they’re a stream of consciousness. We are imperfect beings, and XHTML’s (intended) strictness places the burden of content perfection on ourselves; HTML, in effect, allows the computer to try to “do what we mean, not what we say.”

The only mental inconvenience I have with HTML now is a syntactic one: When I see an HTML tag without a closing “/>”, my mind doesn’t yet recognize that it’s unclosed. I’m so used to XHTML that I need to acclimate myself to the different syntax.

I could see benefit in choosing one or the other in different use cases, but in most cases HTML could be a perfectly reasonable choice (i.e. when the site is intended for human consumption). As long as HTML handles imperfect syntax uniformly across browsers, that is.

On June 21, 2008, Austin Govella said:

I just recalled something from when xhtml came out and everyone was urged to switch.

Everybody understood the limitations you mention, but took the pill because we were all waiting for this brilliant xml future. And the minute that future arrived, you would add your xml declaration at the top and change the mime type on your server, and voila!

The xhtml movement was future-proofing, the last time you would need to touch the markup for your content. All changes after that would be redesigns where you only touched the css! It would be a paradise!!!

On June 22, 2008, Ben Henick said:

First off, I’m one of those fools who knowingly misuses XHTML… because he can. Even so, I’m having an awfully hard time finding anything from the earlier post with which I can disagree. I think that the bozos know you’re right, so they’re whining.

As for the rest I’m with Austin, though I’m probably more sanguine about the extent to which CSS can be leveraged even in the environment we have.

What mystifies me is how anyone finds this argument worth having. HTML? XHTML? Whatever. For typical production, the difference matters exactly how? Pick one, be done with it, and don’t consider yourself all that special either way… which sounds an awful lot like one of the things you said in the first place.

As for true professionalism, well, we need best practices above and beyond what the W3C has to offer. For that matter, there are still a ton of shops out there that will see your Recommendation and raise you an STFU: the correlation between standards-friendliness and the bottom line is still awfully weak. The correlation between trust in vendors and their bottom line is as strong as ever in the meantime, regardless of the means by which that trust is earned (or wheedled).

On June 22, 2008, markus said:

I think a few out there may really like Xhtml and thus try to argue for it merits.

Personally I gave up not only on XHTML but on XML also. I will refuse to use anything that has a XML config at all. (Haha that means I cant use Java…)

I dont really mind btw because I autogenerate html or xml or xhtml anyway (and if i autogenerate, i dont care what the end result is anyway, cuz i work with an intermediate), and chose to use human-readable text files, but for others who dont autogenerate it it is really a shame

On June 22, 2008, manuelg said:

And lastly XHTML is an application. Where HTML is markup.

This is outstanding. I will carve this on my gravestone. Right after I dig my own grave and bury myself alive.

On June 22, 2008, Nitroadict said:

XHTML is a redundant solution to a problem we don’t have, developed by a committee (keyword: COMMITTEE, as in, also, BUREAUCRACY, and POLITICS) that also gave us the pile of crap that is CSS 2. Thanks?

I fully agree that there is nothing wrong with HTML 4.01, and using something to enhance it can’t be bad. Although, it might not be standards, so let’s all yell & scream about that, LoL!!!

I look forward to any future entries you make on this subject, as sanity is hard to come by nowadays.

On June 22, 2008, Goodrone said:

Don’t you find BeautifulSoup to be a slow tool? (I did.) Anyway, dealing with XML is much faster.

On June 22, 2008, Austin Govella said:

You can scrape well-formed html 4 pretty well, too.

In fact, I think the only different between html 4 and xhtml is the closing of the image and break tags. (And who uses break tags anymore? Please!)

I still use xhtml, though. Have for years, which is why I stay there. I know exactly how it works with what css in all the browsers.

P.S. Xhtml 1.1 has fewer tags than html 4. Xhtml 1.0 continued the html 4 deprecation of a bunch of tags. Xhtml 1.1 went ahead with killing them. Also, semantic markup (often conflated with the xhtml/css movement) lends itself to fewer tags, which may be where the misperception comes from.

I can’t remember I’m remembering all this crap from 8 years ago.

Does anyone out there remember Netscape and IE 4 hacks? I swear there’s an entire ghetto in my brain filled with that crap.

On June 22, 2008, Joeboy said:

I think I agree with you, but out of interest I did some (possibly flawed, definitely unscientific) tests to try and work out if browsers (well, Firefox 3) rendered XHTML any faster then HTML, and concluded that XHTML was rendered ~5% faster. I’m not suggesting that’s an particularly compelling argument, but I thought it might be interesting.

My current position on the matter is that the web framework I use outputs XHTML, and I don’t see any great disadvantage in that so I’m sticking to XHTML for now.

On June 22, 2008, arien said:

The detail of the transfer protocol always matter. Try serving HTML as text/plain and see how well that works. (Or CSS as image/png. You get the idea.)

On June 22, 2008, jinzo said:

Not my buisness, but since when did you become monk of anti XHTML and pro HTML campaigns ? I’ll be happy if people at least output VALID stuff, doesen’t matter what markup it is ( but for validty it needs some specs so not any stuff, but specd stuff).

So these debates are pointless, when there’s 3/4 of web still outputing tag soup.

P.S: I would love OpenID support here (:

On June 22, 2008, Ryan said:

@jinzo

That us true, but eople who work with XHTML served as text/html should at least be fully aware it’s not really XHTML they’re using, but invalid HTML pretending to be XHTML.

Only when they’re aware can they can make an educated decision what to use.

On June 22, 2008, James Bennett said:

@Austin: yes, XHTML 1.1 cut some stuff. But it also A) introduced some things as well (see: Ruby markup) and B) since you’re not supposed to pretend it’s compatible with HTML 4, hardly anybody bothers with it.

@arien: thing is, you can take someone else’s well-formed XHTML document, serve it with the correct media type, and still create a situation where it must be considered non-well-formed. Not only that, but falling back to the source of metadata that’s most likely to be correct — e.g., the metadata in the XHTML document itself — is expressly forbidden by the relevant standards.

On June 22, 2008, whativelearned said:

What I’ve learned about xhtml is there are a lot of people who take standards too seriously.

On June 22, 2008, jg said:

I can’t believe I’m wasting my time reading about this. Isn’t there a war going on?

On June 22, 2008, Lou Quillio said:

I’ll go along with XHTML having been an imperfect argument against tag soup, that HTML never had to be malformed, etc., but I think the argument had to be made. Remember, we were battling proprietary hegemony that depended on confusion and slop for traction.

So maybe XHTML has done its work. And I wonder if Postel isn’t vindicated in the exercise.

LQ

On June 23, 2008, Matt said:

Firefox only displays MathML when it appears in XHTML pages, I believe. Not sure the source of this, whether it’s by design or a quirk of their implementation.

On June 23, 2008, John Handelaar said:

Nitroadict”, if nothing else, has rather dented your inferred assumption that the idiots in this dispute are all on one side.

On June 23, 2008, Nono said:

James, thank you… I made the decesion 2 years ago to go to HTML and not XHTML as I didn’t find any pro for XHTML… People said I was wrong but you prove me right.

I wonder, would using HTML also be a + for HTML 5

On June 23, 2008, Reader said:

Dude, looks like your blog is hacked with all kind of spam keywords that are not visible (hidden style). I suggest you fix it.

On June 23, 2008, Don Ulrich said:

In response to to the post/comments above:

WOW

To begin with XHTML is an application of HTML which is a variant of SGML. This application of HTML is achieved through a namespace. Served as XML it can be parsed natively by XSL. A good example of XSL would be Doc Book.

dra·co·ni·an error handling This is bullshit. A note to design types: Unit testing is all the rage. It allows you to MAKE SURE you don’t deliver errors to clients. Fatal errors in XHTML docs you have failed an interoperability test. You want your clients view,webpage,semantic document to be free of errors don’t you?

All of the arguments here seem to be based on XHTML being served as text/html. I deliver XHTML as XML and serve it as application XHTML+XML. One word, context people.

@Jonathan Snook THX someone has to. The lobby is full of design types. The programmers must be on the lower floor.

On June 23, 2008, James Bennett said:

Hey Don, you might want to check your references, because any markup language using SGML’s syntax and rules is an “SGML application” and any markup language using XML’s syntax and rules is an “XML application”. However, these are not “applications” in the equivocating sense you seem to be promoting.

On June 23, 2008, creo said:

Don’t look any further than http://haml.hamptoncatlin.com !

On June 23, 2008, Don Ulrich said:

@James love the site design.

But: HTML does not programmatically reference SGML. However, XML is used as a substrate and the XHTML namespace is applied to XML. It is an application of XHTML over XML. The pointer is the namespace which carries the XHTML spec. So literally it is an application of XHTML over XML. HTML is a varient of SGML the same way there are varients of Linux.

On June 24, 2008, Jackson said:

In fact, I think the only different between html 4 and xhtml is the closing of the image and break tags. (And who uses break tags anymore? Please!)”

Pardon my ignorance, but what is the alternative to using break tags?

On June 24, 2008, Don Ulrich said:

@Jackson check out w3schools.com the CSS section should be helpful.

@James My partner at work with the Phd said that if you really get into brass tacks the XHTML namespace synthesizes XHTML. (damn engineers)

When I said XHTML is an application I was refering to its complex characteristics. Versus the varient characteristics of HTML relative to SGML. It is too general to say they are mearly applications of their parent technologies. Think spatial.

I have something for release ~mid July. Ever been busting to release a project but you can’t? Damn…

Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.