Referrer Madness

Warning: geek post.

Just solved one of the more puzzling web mysteries over which I’ve had the pleasure to tear my hair out over the years. This one was a doozy, but also kind of fascinating, if you swing that way.

Over the past 36 hours, have been corresponding with a reader of John Battelle’s SearchBlog who was unable to post comments to that site. Every time he clicked Submit, his browser was referred to a PHP Freaks page describing the REMOTE_ADDR environment variable. WTF? I was not able to duplicate the behavior in any browser, and SearchBlog gets dozens of successful comments per day. The reader’s IP address was not in any block or filter in use, and we simply didn’t have any plugin or configuration in place that would redirect commenters to an external site. What in the world could cause this user to be redirected anywhere, let alone to a site completely unrelated to anything on SearchBlog? And why couldn’t I reproduce the behavior?

Late this afternoon, one of the readers’ colleagues (a programmer) tried it, and got a variation of the error, which included the string “%remote_addr”. This programmer happened to know that if you type garbage into the FireFox URL field, it will automatically return an “I’m Feeling Lucky” search result from Google. Since the PHP Freaks page is the first Google result in a search for the string “%remote_addr,” you get sent automatically to that page. When I saw the %remote_addr reference, a lightbulb flickered, and I at least knew where to begin looking.

One of our favorite tools for fighting weblog comment spam is a simple Apache .htaccess directive block that examines the referring URL before allowing access to the comment script. If the referrer doesn’t include one of the hosted customer’s domains, it uses mod_rewrite to send the spammer back to their own IP, making it harder for remote scripts to abuse comment forms. The block we use is adapted from the WordPress Codex:


RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .mt-comments\.cgi*
RewriteCond %{HTTP_REFERER} !.*(thedomain.org).* [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$
RewriteRule /(.*) http://%{REMOTE_ADDR} [R=301,L]

On closer study, it turns out that the site from which I originally snarfed the Rewrite code had included a typo. Where the RewriteRule should have read:

RewriteRule /(.*) http://%{REMOTE_ADDR} [R=301,L]

it instead read:

RewriteRule /(.*) http://%(REMOTE_ADDR) [R=301,L]

Because the rule used parentheses rather than curly braces, the IP substitution wasn’t made — the literal string “http://%remote_addr” was becoming the URL to which the user was directed. And when submitted via FireFox, the garbage URL was turned into an “I’m Feeling Lucky” search. Pernicious.

Now I was on the right track, but there was more to solve. Since the user was using a browser rather than running a remote commenting bot, why was the directive being triggered at all? The REFERRER variable should have been present. On a hunch, I dug through the menus for the amazing FireFox Web Developer’s Extension. Pulled down Tools | Web Developer | Disable | Referrer Logging, hit John’s site again, and for the first time, was able to reproduce the problem.

So, it seems, both the reader and his colleague had this extension installed, and that particular option selected (this behavior would also be triggered if the user was behind a firewall or proxy that blocked the REFERRER string coming from the user agent; fortunately, it’s rare that the REFERRER is masked by any means — rare enough that I’m not tempted to stop using this technique).

So… a bizarre alchemical reaction between a fairly obscure option in a fairly obscure browser plugin, a pasted typo in an apache configuration, and the quirky way FireFox handles garbage URLs. Frustrating as hell, but satisfying to have solved.

Tip of the hat to Ethan Stock and Tom Hill for their assistance sleuthing this out.

Music: Sufjan Stevens :: Prairie Fire That Wanders About

One Reply to “Referrer Madness”

Leave a Reply

Your email address will not be published.