XHTML file being edited by vim.

Convert Your HTML 4 and XHTML Web Pages to HTML 5

Moving to HTML 5

You need to have your web pages in HTML 5, using CSS or Cascading Style Sheets for the responsive design needed to make your pages equally useful on mobile devices including smartphones and tablets, laptops, and desktops. The following is based on my notes for web site conversion, with red marking what to remove or change, and green marking what to add or verify.

What a Messy History!

The long and sloppy evolution of HTML included some "rules" that were strict, adding a lot of complexity to handle browsers' inconsistent and erratic treatment of invalid syntax.

The W3 organization dropped development of HTML in 1998 in favor of the XML-based XHTML. There was XHTML 1.0, with a standard that insisted on XML-style closure and proper nesting of tags. But no browsers enforced these rules or even provided warnings. Internet Explorer wouldn't display an invalid XHTML page, but that's because it didn't know how to display a valid XHTML page! It would ask you where to save the mysterious data file it couldn't handle.

Then XHTML 2 was supposed to solve those problems by forcing browsers to reject invalid XHTML 2 pages, while throwing out the sloppy quirks and conventions inherited by old versions of HTML and providing the benefits promised but not delivered by XHTML 1.

Development dragged on, slower and slower, with the XHTML 2 design looking good in theory but with the practical details as awkward as ever. Meanwhile, Internet Explorer remained mystified by XHTML.

A group from Opera, Mozilla and Apple formed a working group to extend HTML. The W3C disbanded their XHTML group and started working on formalizing an HTML5 standard instead. Now...

HTML 5 is the modern markup language.

The current full HTML 5 specification is available at w3.org, but I think that what you most likely need is guidance converting your existing web pages from HTML 4 and/or XHTML to the modern HTML 5. It's tedious but not too difficult to do the conversion.

I created this page as a reminder to myself as to what you must change to convert a page from XHTML to valid HTML 5.

DOCTYPE

This becomes much simpler. Red for what you delete:

XHTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

HTML 5:

<!DOCTYPE html>

Wow, that was easy...

<html>

This also becomes simpler.

You no longer have to explain to the browser where it might find the xmlns specification. Think about it. If the browser didn't know this already, it isn't going to know how to go look it up. That was an especially pointless part of the old specification.

Drop the xmlns specification (red), and add a specification of which human language you're using (green). Leave the existing XML language specification in place.

XHTML:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

HTML 5:

<html lang="en" xml:lang="en">

CSS is the only style language, and Javascript is the only scripting language

In the <head> block, you simply tell the browser where to find the style sheet. It's a style sheet, so it's obviously CSS. There is no longer any need to explain that. Here's an example where I pull in a standard style sheet from a file, and then specify three page-specific styles I could use with something like:
<h1 class="bordered">CSS For Style</h1>

XHTML:

<link rel="stylesheet" type="text/css" href="/css/style.css" />
<style type="text/css">
	h1.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h2.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h3.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
</style> 

HTML 5:

<link rel="stylesheet" href="/css/style.css" />
<style>
	h1.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h2.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
	h3.bordered {
		background: #303030;
		color: #c8d0b4;
		border: 1px solid #808080;
		padding: 3px;
	}
</style> 

Similarly, if you have Javascript, leave out the type="text/javascript" specification. Browser-side scripting is done in Javascript. For example, a Google AdSense ad definition:

HTML 4 or XHTML:

<script type="text/javascript" async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle responsive"
	style="display:inline-block;"
	data-ad-client="ca-pub-5845932372655417"
	data-ad-slot="1756148203"></ins>
<script type="text/javascript">
	(adsbygoogle = window.adsbygoogle || []).push({});
</script>

HTML 5:

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle responsive"
	style="display:inline-block;"
	data-ad-client="ca-pub-5845932372655417"
	data-ad-slot="1756148203"></ins>
<script>
	(adsbygoogle = window.adsbygoogle || []).push({});
</script>

Character Encoding

This is nothing new, but make certain that you specify an appropriate character set in the <head> section! Unless you are doing something rather unusual (at least from an English speaker's viewpoint), UTF-8 is appropriate. It is backward compatible with ASCII and can represent every one of the 1,112,064 code points in the Unicode character set. RFC 3629 specifies UTF-8 as a standard Internet protocol encoding.

<meta charset="UTF-8" />

Also make sure that you are using the correct characters! Use Unicode to specify what you can't do with HTML's named character encoding.

Constant-Width Font

Earlier versions of HTML supported:

<tt>Typewriter/teletype font here!</tt> 

Do it this way instead:

<code>Typewriter/teletype font here!</code> 

I have two lines in the main /css/style.css file reading as follows to try to force browsers to display literal text the way I want:

pre,code { font-family: Courier, monospace; text-align: left; }
pre { overflow: auto; word-wrap: normal; white-space: pre; }

Tables

A number of things need to change in tables, at least in the way I have been using table markup.

cellpadding and cellspacing

Neither of these are valid attributes of the "table" element. The spacing between cells is an attribute of the table itself, while the padding is an attribute of the individual table cells.

Old:

<table cellpadding="5" cellspacing="7">
	<tr>
		<td> First row, first column </td>
		<td> First row, second column </td>
	</tr>
	<tr>
		<td> Second row, first column </td>
		<td> Second row, second column </td>
	</tr>
</table> 

HTML 5:

<table border-collapse: separate; border-spacing: 7px;>
	<tr>
		<td style="padding: 5"> First row, first column </td>
		<td style="padding: 5"> First row, second column </td>
	</tr>
	<tr>
		<td style="padding: 5"> Second row, first column </td>
		<td style="padding: 5"> Second row, second column </td>
	</tr>
</table> 

If you wanted the spacing to be 0, use collapse instead of separate:

<table style="border-collapse: collapse; border-spacing: 0;"> 

I have placed this in my main CSS file:

table { border-collapse: collapse; border-spacing: 0; text-align: left; margin-bottom: 5px; }
td { vertical-align: top; padding: 5px; }

align and valign

Guess what, these are also invalid now!

The horizontal alignment can be specified for the entire table, or it could be done for individual <td> elements. You can center things table-wide with just text-align most of the time. That will work for text and images. But if you have something more complex, like a table inside a table, you instead need the margin trick.

Tables inside of tables? Yes, see the footer at the bottom of my pages. There is a centered table with five rows: an image of text reading "To inquire about ...", a table of areas of this site, a table of logos, a line of text, and an Amazon ad. The second and third of those appear against the left margin if I use just text-align to specify that table. The standard footer included into my pages also specifies margin to center everything.

Vertical alignment is done for individual <td> elements.

Old:

<table>
	<tr>
		<td align="center" valign="middle"> First row, first column </td>
		<td align="center" valign="middle"> First row, second column </td>
	</tr>
	<tr>
		<td align="center" valign="middle"> Second row, first column </td>
		<td align="center" valign="middle"> Second row, second column </td>
	</tr>
</table> 

HTML 5:

<table class="centered" style="margin: 0 auto;">
	<tr>
		<td style="vertical-align: middle;"> First row, first column </td>
		<td style="vertical-align: middle;"> First row, second column </td>
	</tr>
	<tr>
		<td style="vertical-align: middle;"> Second row, first column </td>
		<td style="vertical-align: middle;"> Second row, second column </td>
	</tr>
</table> 

Amazon Associates Ads in HTML 5

The Amazon Associates program lets you put ads in your pages and receive a commission when someone buys any item after clicking on the link on your page. You can see two examples below.


Amazon
ASIN: 1118008189

Amazon
ASIN: 144936294X

The first display ad above is one that I selected for the book HTML and CSS: Design and Build Websites by Jon Duckett. Notice that the text of that last sentence contains a text link to the Amazon page for the item.

There is one minor inconvenience: Amazon gives you invalid HTML code!

Let's see what we get and what we have to do to fix them. I am artificially breaking the enormously long single lines into multiple lines for convenient display here.

Display ad for a specific product

You must remove some invalid specifications and replace the bare & characters with descriptions of ampersands.

Invalid version as provided by Amazon:

<iframe style="width:120px;height:240px;" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" src="//ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&OneJS=1&Operation=GetAdHtml&MarketPlace=US&source=ac&ref=tf_til&ad_type=product_link&tracking_id=cromwelintern-20&marketplace=amazon&region=US&placement=1118008189&asins=1118008189&linkId=c17c45c10b3852c86d02c69bf3368ccd&show_border=false&link_opens_in_new_window=false&price_color=333333&title_color=0066c0&bg_color=ffffff">
    </iframe>

Valid HTML 5:

<iframe style="width:120px;height:240px;margin:0;border:none;" src="//ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=US&amp;source=ac&amp;ref=tf_til&amp;ad_type=product_link&amp;tracking_id=cromwelintern-20&amp;marketplace=amazon&amp;region=US&amp;placement=1118008189&amp;asins=1118008189&amp;linkId=c17c45c10b3852c86d02c69bf3368ccd&amp;show_border=false&amp;link_opens_in_new_window=false&amp;price_color=333333&amp;title_color=0066c0&amp;bg_color=ffffff"></iframe>

Text ad for a specific product

Replace the bare & characters with descriptions of ampersands.

Amazon's invalid version:

<a target="_blank" rel="noopener" href="https://www.amazon.com/gp/product/1118008189/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1118008189&linkCode=as2&tag=cromwelintern-20&linkId=5f76999fc11141b5f9c09a78872ef36b"&gt;HTML and CSS: Design and Build Websites</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=cromwelintern-20&l=am2&o=1&a=1118008189" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />

Valid HTML5:

<a target="_blank" rel="noopener" href="https://www.amazon.com/gp/product/1118008189/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1118008189&amp;linkCode=as2&amp;tag=cromwelintern-20&amp;linkId=5f76999fc11141b5f9c09a78872ef36b"&amp;gt;HTML and CSS: Design and Build Websites</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=cromwelintern-20&amp;l=am2&amp;o=1&amp;a=1118008189" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />

The above includes a reference for an image to load when the mouse hovers on the link. I shaded it grey above, I usually remove that from the code provided by Amazon:

<a target="_blank" rel="noopener" href="https://www.amazon.com/gp/product/1118008189/ref=as_li_tl?ie=UTF8&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1118008189&amp;linkCode=as2&amp;tag=cromwelintern-20&amp;linkId=5f76999fc11141b5f9c09a78872ef36b"&amp;gt;HTML and CSS: Design and Build Websites</a>
Simplifying HTML
With PHP and CSS

Next Step: Automate with PHP

Those Amazon examples above could be much simpler, along with the microdata, schema markup, and other HTML details needed for search engine optimization and integration with social media feeds.

See the next page, Simplifying HTML with PHP and CSS, for details on how to make your web site easier to maintain.

Validation

So now you have made many changes. Are they correct?

Validate your pages to make sure that they are correct HTML. If you used some web authoring tool, they very likely will not be correct HTML. Also check the Cascading Style Sheet markup.

Google has a Structured Data Testing Tool to see if your microcode was properly generated.

While you are improving your page, verify the links to make sure they work:
Link Valet Broken Link Checker