Roger Johansson, 456 Berea Street
Warning! Old content ahead. The latest update was 2008-11-01.
XHTML 1.0 is a reformulation of HTML 4 in XML 1.0 and was developed to replace HTML. However, there is nothing preventing you from using HTML 4.01 to build modern, standards compliant, and accessible websites. Whether you use HTML 4.01 or XHTML 1.0 doesn’t really matter all that much.
What is more important is to properly separate structure from presentation. Strict doctypes allow less presentational markup and enforce separation of structure from presentation, so I recommend using HTML 4.01 Strict or XHTML 1.0 Strict.
XHTML 1.1, which is the latest version of XHTML, is technically a bit more complicated to use, since the specification states that XHTML 1.1 documents should have the MIME type application/xhtml+xml
, and should not be served as text/html
. It isn’t strictly forbidden to use text/html
, but it is not recommended.
XHTML 1.0 on the other hand, which should use application/xhtml+xml
, may also use the MIME type text/html
, if it is HTML compatible. The W3C Note XHTML Media Types contains an overview of MIME types that are recommended by the W3C.
Unfortunately some older web browsers, and Internet Explorer, do not recognize the MIME type application/xhtml+xml
, and can end up displaying the source code or even refuse to display the document.
If you want to use application/xhtml+xml
you should let the server check if the browser requesting a document can handle that MIME type, and in that case use it, and use text/html
for other browsers.
This is called “content negotiation”. Instead of going into details about it here I refer you to the following writeups:
Note that when the MIME type is application/xhtml+xml
, some browsers, for example Firefox, will not display documents that aren’t well-formed. This can be a good thing during development since it immediately makes you aware of some markup errors. However it may cause problems on a live site that gets updated by people who are not XHTML experts, unless you can ensure that all code stays well-formed. If you cannot guarantee well-formedness you should probably avoid content negotiation and use HTML 4.01 or “HTML compatible” XHTML 1.0 instead.
Here is a list of the things that are most important to consider when using XHTML 1.0 Strict instead of HTML 4.01 Transitional (or no-name, plain old invalid HTML):
Always use lower case, and quote all attributes: All element and attribute names must be in lower case. All attribute values must be quoted.
Incorrect: <A HREF="index.html" CLASS=internal>
Correct: <a href="index.html" class="internal">
Close all elements: In HTML, some elements don’t have to be closed. Such elements are automatically closed when the next element starts. XHTML does not allow that. All elements must be closed, even those that have no content, like img
.
Incorrect: <li>Item 1
Correct:
<li>Item 1</li>
Incorrect: <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Correct: <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
Incorrect: <br>
Correct: <br />
Incorrect: <img src="image.jpg" alt="">
Correct: <img src="image.jpg" alt="" />
Attributes can not be minimized: In HTML, certain attributes can be minimized. XHTML does not allow this.
Incorrect: <input type="checkbox" id="checkbox1" name="checkbox1" checked>
Correct: <input type="checkbox" id="checkbox1" name="checkbox1" checked="checked" />
Don’t use deprecated elements: Some elements and attributes that are allowed in HTML 4.01 Transitional and XHTML 1.0 Transitional are deprecated in XHTML 1.0 Strict (and in HTML 4.01 Strict). A few examples are font
, center
, alink
, align
, width
, height
(for some elements), and background
.
I recommend sticking to most of these rules even if you are writing HTML 4.01. Doing so makes the markup much easier to read and maintain, and has become a widely used convention. So when writing HTML 4.01:
img
, link
, and input
)html
, head
, body
, and tbody
)The doctype, or DTD (Document Type Declaration), used to be more decorative than functional, but for quite a few years now the presence of a doctype has been able to greatly affect the rendering of a document in a web browser.
All HTML and XHTML documents must have a doctype declaration to be valid. The doctype states what version of HTML or XHMTL is being used in the document. The doctype is used by the W3C markup validator when checking your document and by web browsers to determine which rendering mode to use.
If a correct and full doctype is present in a document, most web browsers will switch to standards mode, which among other things means that they will follow the CSS specification closer. This will reduce the difference in rendering between browsers.
The following doctypes will will make the web browsers that have “doctype switching” use their standards mode:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
For more detailed information about Doctypes, see my Opera Web Standards Curriculum article Choosing the right doctype for your HTML documents.
All XHTML documents should specify their character encoding. If you don’t, the browser will have to guess which character encoding to use. If it guesses wrong your visitors may have a hard time reading the text on your website.
The best way of specifying the character encoding is to configure the web server to send an HTTP content-type
header with the character encoding. For detailed information on how to do this, check the documentation for the web server software you are using.
If you’re using Apache, you can specify the character encoding by adding one or more rules to your .htaccess
file. For example, if all your files use utf-8, add this:
AddDefaultCharset utf-8
To specify a character encoding for files with a certain filename extension, use this:
AddCharset utf-8 .html
If your server lets you run PHP scripts, you can use the following to specify the character encoding:
<?php header("Content-Type: text/html; charset=utf-8"); ?>
To serve your pages as XHTML, change text/html
to application/xhtml+xml
. If you, for whatever reason, are unable to configure your web server to specify the character encoding you are using properly, use a meta
element as the first child of the document’s head
element. It’s a good idea to specify the character encoding this way even if your server is configured correctly.
For example, the following meta
element tells the browser that a document uses the ISO-8859-1
character encoding:
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
The Web Standards Project asks the W3C if you should use HTML or XHTML, and why.
An A List Apart article on the transition from HTML to XHTML.
A good explanation of how to use XHTML and CSS.
The W3C explains the difference between XHTML 1.0 and HTML 4
The Web Standards Project asks the W3C which MIME type should be used for HTML and XHTML, and why.
A summary of which media types should be used for serving XHTML documents.
HTML Dog’s guide to the elements and attributes you should not use in XHTML.
A document on MIME types and how to do content negotiation with different server side scripting languages.
A W3C document on mime types and XHTML.
An A List Apart article on how to use doctype, and why.
A summary of how different doctype declarations affect browsers that have doctype switching.
The W3C’s official list of correct doctype declarations.
The Web Standards Project asks the W3C how authors should specify character encoding.
An article on different character encodings.
An explanation of how to use national and special characters in HTML documents.
A tutorial on how to choose and declare a character encoding.
Comments, questions or suggestions? Please let me know.
© Copyright Roger Johansson