How To Use Google AdSense Within XML/XHTML
Using Google Adsense within XML/XHTML
Before starting, a better plan is to convert your old HTML4/XHTML to modern HTML.
First, How Does AdSense Work?
You set up an account with Google and specify the ads you want. I said "Images if you have them but text if not" and specified some medium and large banners and rectangles in certain color schemes. Google then generated some small JavaScript blocks for me to place as I want within my pages.
The JavaScript I place within my page is short and simple so that I and my server have little to do.
When you view my page, you direct your browser to retrieve data from a specified URL. My server sends over an HTML document. If you have JavaScript enabled in your browser, then your computer executes each JavaScript block within the page.
The simple JavaScript blocks I added to my pages just
set four variables and then they retrieve and execute a
JavaScript program from pagead2.googlesyndication.com.
That automatically retrieved JavaScript code does
the real work of getting the ad itself.
The ad retrieved is the result of doing something like the reverse of the typical Google search. Instead of answering the question, "What pages are related to this search string?" it's more like "To which search strings or ad descriptions would this page relate?" Put another way, it tries to automatically select ads on topics similar to the topic of the page. One of those four original variables specifies my Google account, so if you happen to retrieve an ad that interests you and you click on it, Google knows whom to credit.
A side effect that I didn't anticipate is that it shows me what Google thinks my pages are about. It does a good job selecting relevant ads on many of my pages, like TCP/IP, Linux/Unix, some of my information security pages, my attempts to understand Turkish grammar, my travel suggestions, and the Toilets of the World.
Google gets a little confused on many of my information security pages, obsessing on the term "security" appearing in the URL and throughout the page, and frequently offering ads for the vast and largely non-technical physical security industry in the U.S.
It's harder for it to automatically figure out what some pages are about, like one explaining how to create Cyrillic text in Unicode, the LATEX markup language, and Postcript. If it cannot decide what the page might be about, it might offer "public service ads", generally promotions for charities. However, if you have enough content on your site and at least some pages clearly on some topic, Google will reasonably assume that a mystery page should get ads from what it sees as the general theme of the site.
OK, that's what AdSense is and how it works. Why is this page here?
The Problem and an Attempted Solution
Google AdSense ads
are based on
JavaScript
using
document.write()
calls.
However, that doesn't work within an XML/XHTML document.
Here is a workaround!
However,
as discussed below,
we will also have to solve far worse problems caused by
Microsoft Explorer's inability to handle XHTML.
In more detail,
a Google AdSense ad looks like the following
within a web page.
The first block sets values for four variables,
and the second effectively says
"Set some variables,
and then retrieve a JavaScript program
from the following location and execute it."
As for those variables, google_ad_client
identifies
me so I get the credit for ad views and clicks, while
google_ad_slot
refers to one of my specific
ad definitions: specific dimensions, color scheme if this
page load yields a text-only ad instead of a graphical one,
and a specification of the types of pages and usual location
within the page where this ad appears.
Google suggests that you simply insert this JavaScript
into an HTML file, but that doesn't necessarily work —
hence the reason for this page!
<script type="text/javascript"> <!-- google_ad_client = "pub-5845932372655417"; google_ad_slot = "1979399418"; google_ad_width = 728; google_ad_height = 90; //--> </script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script>
The problem is that the retrieved program
show_ads.js
contains calls to JavaScript's
document.write()
function, and that function
is disabled within an XHTML or XML page.
Depending on your browser and how strict it is, you might
get the ad, or maybe an error message, or maybe nothing at all.
How do I know what's in the JavaScript program? I wondered why it didn't work and so I retrieved a copy with wget:
$ wget http://pagead2.googlesyndication.com/pagead/show_ads.js $ vim show_ads.js
Sure enough, document.write();
plays a crucial role.
Some XML/XHTML solution is needed....
Here is a workaround: step by step:
Create a Proper HTML Document Containing the JavaScript
Note that this page assumes you are using Apache and using the default locations for things:
/var/www/conf
Configuration files, including httpd.conf
and mime.types
as discussed below.
/var/www/logs
Log files, including access_log
and error_log
as discussed below.
/var/www/htdocs
The web site itself is located here. The file
/var/www/htdocs/Index.html
is the default page, what the server provides when
asked for simply
http://server-name-here/
.
I created the below HTML file as:
/var/www/htdocs/ads/content-banner-technical.html
so it could be retrieved as:
https://cromwell-intl.com/ads/content-banner-technical.html
It's just
the JavaScript from above
as the body of a small HTML file.
Notice the
CSS style elements,
width: 100%
and
overflow: visible
— those are critical to get the ad to appear as
it should, without scroll bars or other visual oddities.
<?php header("Content-Type: text/html;charset=utf-8"); ?> <html> <head> <title>Sponsorship</title> <style type="text/css"> body { margin: 0; padding: 0; width: 100%; overflow: visible;} </style> </head> <body> <div style="padding: 0; width: 100%; overflow: visible;"> <script type="text/javascript"> <!-- google_ad_client = "pub-5845932372655417"; /* Top Banner for technical content pages */ google_ad_slot = "1979399418"; google_ad_width = 728; google_ad_height = 90; //--> </script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script> </div> </body> </html>
Create a "Wrapper" to Minimize Code Maintenance
I next created the below HTML file as:
/var/www/htdocs/ads/banner-technical.html
Notice that it sets the MIME type of just this encapsulated
object as text/html
so that document.write();
will work within what
will be this embedded HTML document.
Also notice that it specifies the width and height of the
included HTML object in pixels, in both the outer
DIV wrapper and the object itself.
This wrapper's width is defined as 100% of the page, so the
enclosed banner will be centered.
The other wrappers for rectangles and squares will have their
width defined in pixels so they can be floated to the left
and right and the text can then flow around them.
If you do not specify the height and width as I have done here,
the browser will not know in advance how
to lay out the page and the result may look very strange.
<div style="width: 100%; height: 90px;"> <table style="width: 100%; background: #f0a0f0; height: 90px; padding: 0;"> <tr> <td class="centered"> <object data="/ads/content-banner-technical.html" type="text/html" style="width: 728px; height: 90px; padding: 0;"> </object> </td> </tr> </table> </div>
I could have put above "wrapper" block directly in an HTML file, but then I have to maintain twelve-line blocks scattered across hundreds of HTML files! The next step makes it much easier.
Include The Ad by Pulling in the Wrapper with PHP
All I have to do now is add one PHP line to a file to include the wrapper and let it pull in the code. This page, for example, literally begins as shown here. See how it pulls in two ads at the very beginning, a 728x90 banner across the top before the large header "How To Use Google AdSense Within XML/XHTML", and then a 300x250 box that floats to the right, followed by the rest of the page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>How To Use Google AdSense Within XML/XHTML</title> <meta name="description" content="How to use Google AdSense within XML/XHTML pages. Google AdSense uses JavaScript document.write();, which is not allowed within XML/XHTML. Here is a simple solution to the problem." /> <meta http-equiv="content-type" content="application/xhtml+xml; charset=iso-8859-1" /> <link rel="stylesheet" type="text/css" href="../css/style.css" media="screen" /> </head> <body style="background: #f4e4b0"> <?php include($_SERVER['DOCUMENT_ROOT'].'/ads/responsive-infosec.html'); ?> <h1>How To Use Google AdSense Within XML/XHTML</h1> <?php include($_SERVER['DOCUMENT_ROOT'].'/ads/responsive-rctngl-technical.html'); ?> <div class="bordered" style="font-size: 9pt;"> <p style="margin-top: 1px; margin-bottom: 1px;"> <b> Table of Contents / Summary: </b> </p> <ul style="margin-top: 1px; margin-bottom: 1px;"> <li> <a href="#first"> First, How Does AdSense Work? </a> </li> [ .... and so on with the rest of the page .... ]
The overall result making up the entire page is a valid XML/XHTML document encapsulating a short HTML block of specified size containing nothing but Google's JavaScript program.
Keep Logging Under Control
The Apache log file /var/www/logs/access_log
will
be just exploding with unwanted details by now — every
request for images to complete the page, plus the style
sheet, and now plus these ad pages.
So, let's tell Apache to just log the pages themselves
and not their "prerequisites", the things they require.
Below is a section of my /var/www/conf/httpd.conf.
Lines highlighted in yellow
are lines that I added,
lines highlighted in magenta
are original lines that I commented out, and
the green %h
is something I added to capture the client IP for each
logged referral:
[ ... over 500 preceding lines not shown ...] # # The following directives define some format nicknames for use with # a CustomLog directive (see below). # LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %b" common LogFormat "%h -- %{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent # # # The location and format of the access logfile (Common Logfile Format). # If you do not define any access logfiles within a <VirtualHost> # container, they will be logged here. Contrariwise, if you *do* # define per-<VirtualHost> access logfiles, transactions will be # logged therein and *not* in this file. # ### Stop requests for images, style sheets, etc, as described here: ### http://www.vbulletin.com/forum/showthread.php?t=25287 SetEnvIf Request_URI \.gif not-logged SetEnvIf Request_URI \.png not-logged SetEnvIf Request_URI \.jpg not-logged SetEnvIf Request_URI \.jpeg not-logged SetEnvIf Request_URI \.ico not-logged SetEnvIf Request_URI style\.css not-logged SetEnvIf Request_URI ads/content not-logged CustomLog /dev/null combined env=not-logged CustomLog logs/access_log common env=!not-logged # CustomLog logs/access_log common # # If you would like to have agent and referer logfiles, uncomment the # following directives. # CustomLog logs/referer_log referer env=!not-logged #CustomLog logs/referer_log referer #CustomLog logs/agent_log agent [ ... over 500 more lines not shown ...]
Dealing with Bigger Problems Caused by Explorer's Inability to Handle XHTML
Microsoft's Internet Explorer cannot handle XHTML documents. See this description for the details, which include IE8. If you serve an XHTML document to Explorer, it doesn't know how to handle it and asks if you want to save it to a file. All other browsers can handle XHTML, see the W3 group's answer to the question of which browsers accept media type application/xhtml+xml, where I have emphasized a significant comment they made on the page they wrote in 2004:
Browsers known to us include all Mozilla-based browsers, such as Mozilla, Netscape 5 and higher, Galeon and Firefox, as well as Opera, Amaya, Camino, Chimera, DocZilla, iCab, Safari, and all browsers on mobile phones that accept WAP2. In fact, any modern browser. Most accept XHTML documents as application/xml as well. See the XHTML Media-type test for details.
I would happily have an entire web site that required you to use any browser other than Explorer. However, that would mean less page views, less ad clicks, and less ad income for me. So, I need to find a way to support the lame Microsoft Explorer, the world's most dangerous software, especially when coupled with ActiveX.
There are at least four possible solutions:
1 — Re-write all my pages to use plain old HTML instead of XHTML
This is the preferred solution!
2 — Modify the MIME type and document content using PHP as the page is served up, based on what the browser says it can handle
There are PHP tricks to use the browser's advertised
Accept
string to figure out what the user agent,
be it a browser or crawler or whatever, can handle, and
give it precisely that.
For example,
Neil Crosby's detailed solution.
As an explanation of the Accept
string, see the
below
Wireshark
capture
of Firefox viewing this page.
The first block, starting "GET", is from my client;
the second, starting "HTTP/1.1" is from the server:
GET /technical/google-adsense-and-xhtml.html HTTP/1.1 Host: cromwell-intl.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090914 \ Mageia Linux/1.9.1.3-2mdv2010.0 (2010.0) Firefox/3.5.3 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://cromwell-intl.com/technical/ Cache-Control: max-age=0 HTTP/1.1 200 OK Date: Thu, 15 Oct 2009 23:46:01 GMT Server: Apache X-Powered-By: PHP/5.2.8 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html [... page content appears here...]
I am impressed by the careful attention to detail in solutions
such as those by
Neil Crosby,
but I just don't want to add too much bulk and overhead
to every single page.
And as you can see
on Neil's page,
the server's PHP engine needs to modify all the page content,
for example, changing every instance of XHTML
<br />
to the HTML
<br>
.
3 — Use the W3C XML trick
They suggest inserting these two bold lines at the beginning of each XHTML document:
<?xml version="1.0" encoding="iso-8859-1"?> <?xml-stylesheet type="text/xsl" href="/copy.xsl"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>How To Use Google AdSense Within XML/XHTML</title> [.... and so on ....]
Then you would create that /copy.xsl
file:
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform"> <template match="/"> <copy-of select="."/> </template> </stylesheet>
4 — Write the pages in XHTML,
but serve them with a MIME type of text/html
Careful readers of the above packet capture will have already seen that I took the easy and common way out. The W3C says:
-
XHTML should be served with the
application/xhtml+xml
mime type. -
HTML compatible XHTML 1.0 may be served
with the
text/html
mime type. That's 1.0, not 1.1 or following.
Furthermore see "Sending XHTML as text/html Considered Harmful".
Some day I will really need to put in the work to add the PHP modification to every web page on my site. And that will definitely need a Unix shell script instead of a marathon editing session.
For now, though, I'm taking the lazy way out.
I modified the mime.types
file on my server
to serve out files named *.html
as MIME type
text/html.
And in conclusion — It just can't be done....
I really thought I had this solved, and all that remained was for me to implement Neil Crosby's solution for modifying the MIME type and document content field with PHP, based on what the browser says it can handle.
But then I watched someone looking at my site, with Explorer, and saw what happened when they clicked on an ad. With Explorer, and only with Explorer, those embedded HTML objects within the page remain precisely that — embedded objects. A clicked ad on Explorer does not open the ad in the main browser window, it opens within a small window on the main page. You don't see the entire advertisement, you see just a small rectangle of it viewed through a small window with slider bars below and to its right.
So, I went back to an earlier plan.
No "wrapper", my PHP include
now just pulls this in:
<table style="width: 100%; height: 90px; padding: 0;"> <tr> <td class="centered"> <script type="text/javascript"> <!-- google_ad_client = "pub-5845932372655417"; /* Top Banner 728x90 */ google_ad_slot = "5257000457"; google_ad_width = 728; google_ad_height = 90; //--> </script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script> </td> </tr> </table>
What's Left?
The only remaining annoyance, and it's fairly minor,
is PHP's inability to easily deal with absolute paths.
The include
is relative to that page's location,
so I need the ../
shown above.
The "easy" fix is to instead use all this:
<?php include($_SERVER['DOCUMENT_ROOT'].'/ads/responsive-infosec-wrapper.html'); ?>
Ugh.
I'll just keep track of how many instances of "../
"
are needed....
Now I'm ready to harness the awesome power of a converted Ukranian tanker full of Click Monkeys! And if you find that amusing, also see the same group's Pets or Food and ZooBQ.
The next step is search engine optimization (SEO), the art of making search engines pay more attention to your pages.
HTML Tools
How to Make Money with Search Engine Optimization
Various Technical Topics