XHTML file being edited by vim.

Convert Your HTML 4 and XHTML Web Pages to HTML 5

Moving to HTML 5

The long and sloppy evolution of HTML included some "rules" that were strict, adding a lot of complexity to handle browsers' inconsistent and erratic treatment of invalid syntax.

The W3 organization dropped development of HTML in 1998 in favor of the XML-based XHTML. There was XHTML 1.0, with a standard that insisted on XML-style closure and proper nesting of tags. But no browsers enforced these rules or even provided warnings. Well, Internet Explorer wouldn't display an invalid XHTML page, but that's because it didn't know how to display a valid XHTML page! It would ask you where to save the mysterious data file it couldn't handle.

Then XHTML 2 was supposed to solve those problems by forcing browsers to reject invalid XHTML 2 pages, while throwing out the sloppy quirks and conventions inherited by old versions of HTML and providing the benefits promised but not delivered by XHTML 1.

Development dragged on, slower and slower, with the XHTML 2 design looking good in theory but with the practical details as awkward as ever. Meanwhile, Internet Explorer remained mystified by XHTML.

A group from Opera, Mozilla and Apple formed a working group to extend HTML. The W3C disbanded their XHTML group and started working on formalizing an HTML5 standard instead. Now...

HTML 5 is the modern markup language.

The current full HTML 5 specification is available at w3.org, but I think that what you most likely need is guidance converting your existing web pages from HTML 4 and/or XHTML to the modern HTML 5. It's tedious but not too difficult to do the conversion.

I created this page as a reminder to myself as to what you must change to convert a page from XHTML to valid HTML 5.

DOCTYPE

This becomes much simpler. Red for what you delete:

XHTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

HTML 5:

<!DOCTYPE html>

Wow, that was easy...

<HTML>

This also becomes simpler.

You no longer have to explain to the browser where it might find the xmlns specification. Think about it — if the browser didn't know this already, there's pretty much no way it's going to suddenly go look it up. That was a pointless part of the old specification.

Drop the xmlns specification (red), add a specification of which human language you're using (green).

XHTML:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

HTML 5:

<html lang="en" xml:lang="en">

CSS is the only style language, and Javascript is the only scripting language

In the <head> block, you simply tell the browser where to find the style sheet. It's a style sheet, it's CSS, there is no longer any need to explain that. Here's an example where I pull in a standard style sheet from a file, and then specify three page-specific styles I could use with <h1 class="bordered">.

XHTML:

<link rel="stylesheet" type="text/css" href="css/style.css" media="screen" />
<style type="text/css">
	h1.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h2.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h3.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
</style> 

HTML 5:

<link rel="stylesheet" href="css/style.css" media="screen" />
<style>
	h1.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h2.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h3.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
</style> 

Similarly, if you have Javascript, leave out the type="text/javascript" specification. Browser-side scripting is done in Javascript.

Characters

Encoding

This is nothing new, but make certain that you specify an appropriate character set in the <head> section! Unless you are doing something rather unusual (at least from an English speaker's viewpoint), UTF-8 is appropriate. It is backward compatible with ASCII and can represent every one of the 1,112,064 code points in the Unicode character set. RFC 3629 specifies UTF-8 as a standard Internet protocol encoding.

<meta charset="UTF-8" /> 

Also make sure that you are using the correct characters! Use Unicode to specify what you can't do with HTML's named character encoding.

Constant-Width Font

Earlier versions of HTML supported:

<tt>Typewriter/teletype font here!</tt> 

Do it this way instead:

<code>Typewriter/teletype font here!</code> 

I have a line in the main /css/style.css file reading as follows to try to force browsers to display literal text the way I want:

code { font-family: Courier, monospace; }

Tables

A number of things need to change in tables, at least in the way I have been using table markup.

cellpadding and cellspacing

Neither of these are valid attributes of the "table" element. The spacing between cells is an attribute of the table itself, while the padding is an attribute of the individual table cells.

Old:

<table cellpadding="5" cellspacing="7">
	<tr>
		<td> First row, first column </td>
		<td> First row, second column </td>
	</tr>
	<tr>
		<td> Second row, first column </td>
		<td> Second row, second column </td>
	</tr>
</table> 

HTML 5:

<table border-collapse: separate; border-spacing: 7px;>
	<tr>
		<td style="padding: 5"> First row, first column </td>
		<td style="padding: 5"> First row, second column </td>
	</tr>
	<tr>
		<td style="padding: 5"> Second row, first column </td>
		<td style="padding: 5"> Second row, second column </td>
	</tr>
</table> 

If you wanted the spacing to be 0, use collapse instead of separate:

<table style="border-collapse: collapse; border-spacing: 0;"> 

I have placed this in my main CSS file:

table { border-collapse: collapse; border-spacing: 0; text-align: left; margin-bottom: 5px; }
td { vertical-align: top; padding: 5px; }

align and valign

Guess what, these are also invalid now!

The horizontal alignment can be specified for the entire table, or it could be done for individual <td> elements. You can center things table-wide with just text-align most of the time. That will work for text and images. But if you have something more complex, like a table inside a table, you instead need the margin trick.

Tables inside of tables? Yes, see the footer at the bottom of my pages. There is a centered table with five rows: an image of text reading "To inquire about ...", a table of areas of this site, a table of logos, a line of text, and an Amazon ad. The second and third of those appear against the left margin if I use just text-align to specify that table. The standard footer included into my pages also specifies margin to center everything.

Vertical alignment is done for individual <td> elements.

Old:

<table>
	<tr>
		<td align="center" valign="middle"> First row, first column </td>
		<td align="center" valign="middle"> First row, second column </td>
	</tr>
	<tr>
		<td align="center" valign="middle"> Second row, first column </td>
		<td align="center" valign="middle"> Second row, second column </td>
	</tr>
</table> 

HTML 5:

<table style="text-align: center; margin: 0 auto;">
	<tr>
		<td style="vertical-align: middle;"> First row, first column </td>
		<td style="vertical-align: middle;"> First row, second column </td>
	</tr>
	<tr>
		<td style="vertical-align: middle;"> Second row, first column </td>
		<td style="vertical-align: middle;"> Second row, second column </td>
	</tr>
</table> 

Buy HTML and CSS: Design and Build Websites from Amazon.

Amazon Associates Ads in HTML 5

The Amazon Associates program lets you put ads in your pages and receive a commission when someone buys any item after clicking on the link on your page.

You can see three examples at right. The display ad on the left of that set is one that I selected for the book HTML and CSS: Design and Build Websites by Jon Duckett. The text below that (and within that last sentence) contains a text link for the same book. The display ad on the right is the result of telling Amazon to search the category "Books" for the string "html5 css". I don't know what it will pick, Amazon sells a wide range of books. Reload this page and the item appearing at right will probably change.

There is one minor inconvenience: the code that Amazon gives you isn't valid!

Let's see what we get and what we have to do to fix them. I am artificially breaking the enormously long single lines into multiple lines for convenient display here.

Display ad for a specific product

You must remove some invalid specifications and replace the bare & characters with descriptions of ampersands. I also added the border definition into the style as otherwise this gets a wide border but the search one does not. Amazon's invalidity is inconsistent. Also, the code they provide is inconsistent — sometimes you get an iframe snippet, sometimes it's Javascript:

Iframe version as provided by Amazon:

<iframe style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="http://ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&Operation=GetAdHtml&ID=OneJS&OneJS=1&source=ac&ref=qf_sp_asin_til&ad_type=product_link&tracking_id=cromwelintern-20&marketplace=amazon&region=US&placement=1118008189&asins=1118008189&show_border=true&link_opens_in_new_window=true&MarketPlace=US"></iframe> 

Valid HTML 5 for the iframe version:

<iframe style="width:120px;height:240px;" src="http://ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;Operation=GetAdHtml&amp;ID=OneJS&amp;OneJS=1&amp;source=ac&amp;ref=qf_sp_asin_til&amp;ad_type=product_link&amp;tracking_id=cromwelintern-20&amp;marketplace=amazon&amp;region=US&amp;placement=1118008189&amp;asins=1118008189&amp;show_border=true&amp;link_opens_in_new_window=true&amp;MarketPlace=US"><iframe> 

Javascript version as provided by Amazon:

<script>
  amzn_assoc_ad_type = "product_link";
  amzn_assoc_tracking_id = "cromwelintern-20";
  amzn_assoc_marketplace = "amazon";
  amzn_assoc_region = "US";
  amzn_assoc_placement = "B001KKRCFA";
  amzn_assoc_asins = "B001KKRCFA";
  amzn_assoc_show_border = true;
  amzn_assoc_link_opens_in_new_window = true;
</script>
<script src="http://ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&Operation=GetScript&ID=OneJS&WS=1&l=as1&source=ac&ref=tf_til">
</script> 

Valid Javascript version:

<script>
  amzn_assoc_ad_type = "product_link";
  amzn_assoc_tracking_id = "cromwelintern-20";
  amzn_assoc_marketplace = "amazon";
  amzn_assoc_region = "US";
  amzn_assoc_placement = "B001KKRCFA";
  amzn_assoc_asins = "B001KKRCFA";
  amzn_assoc_show_border = true;
  amzn_assoc_link_opens_in_new_window = true;
</script>
<script src="http://ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;Operation=GetScript&amp;ID=OneJS&amp;WS=1&amp;l=as1&amp;source=ac&amp;ref=tf_til">
</script> 

Text ad for a specific product

Replace the bare & characters with descriptions of ampersands.

Provided by Amazon:

<a href="http://www.amazon.com/gp/product/1118008189/ref=as_li_tf_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1118008189&linkCode=as2&tag=cromwelintern-20" rel="nofollow">HTML and CSS: Design and Build Websites</a>

Valid HTML5:

<a href="http://www.amazon.com/gp/product/1118008189/ref=as_li_tf_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1118008189&amp;linkCode=as2&amp;tag=cromwelintern-20" rel="nofollow">HTML and CSS: Design and Build Websites</a>

Search ad for keywords in a category

You must remove some invalid specifications, replace the bare & characters with descriptions of ampersands, and replace any blank spaces in the search string with its ASCII code.

Provided by Amazon:

<iframe src="http://rcm-na.amazon-adsystem.com/e/cm?t=cromwelintern-20&o=1&p=8&l=st1&mode=books&search=html5 css&fc1=000000&lt1=_blank&lc1=3366FF&bg1=FFFFFF&f=ifr" marginwidth="0" marginheight="0" width="120" height="240" border="0" frameborder="0" style="border:none;" scrolling="no"></iframe> 

Valid HTML 5:

<iframe src="http://rcm-na.amazon-adsystem.com/e/cm?t=cromwelintern-20&amp;o=1&amp;p=8&amp;l=st1&amp;mode=books&amp;search=html5%20css&amp;fc1=000000&amp;lt1=_blank&amp;lc1=3366FF&amp;bg1=FFFFFF&amp;f=ifr" width="120" height="240" style="border:none;"></iframe> 

Validation

So you have made many changes. Are they correct?

Validate your pages to make sure that they are correct HTML. If you used some web authoring tool, they very likely will not be correct HTML. Also check the Cascading Style Sheet markup.

While you are improving your page, verify the links to make sure they work:
Link Valet Broken Link Checker