Tech Talk by a Kiwi
How to deal with Bandwidth Bandits
So I was going through my Apache logs for the first time in a very long time. Its just not been a high priority for me. But what I did notice is that two of my blogs, especially stevesgeekspeak.com, have had their bandwidth stolen quite significantly by a few Russian forum websites hotlinking to images on them.
Normally I wouldn’t mind too much. Its not really that big of a deal given how little presence my blogs have. But these particular forums were starting to generate more bandwidth usage than the blogs themselves just by hotlinking to some images. It would seem that several images are being used in people’s forum signatures and so the images are getting displayed on more and more pages as time goes on and those people post more forum posts.
That’s not polite. Where appropriate I always make the effort to get permission to use images. I always host myself those images I do use. Its just plain and simple netiquette.
In the past I have seen people swap the stolen images with some nasty replacements. The most famous being the goatse.cx image used in its place. And sure, that’s good for a laugh and a good way to reduce the likelihood of it happening again, but it doesn’t really reduce the bandwidth used. Just swaps one useful image for a nastier one. I have also seen people link to the various image hosting sites “over bandwidth” images, which seems to me as no better a solution. In such cases all you’re doing is moving the cost of the bandwidth from yourself to someone else. That’s not exactly fair either.
What you really want to do is remove the permission to hotlink entirely. There are lots of ways to do this, but the most simple is a very short script in your Apache .htaccess file that relies on mod_rewrite.
mod_rewrite is a very useful module available for all versions of Apache that allows you to do lots of fancy things. From simple redirections to completely changing the entire layout of your site that the user sees, without changing the underlying file structure. For example, on my blogs everything is listed as if each page is in a subdirectory, but the truth is all pages are generated from a database and not files at all. In essence, the file names or directory names you see in the URL of a page on this site is actually a command to the underlying web software (in the case of my blogs, WordPress) to query the appropriate content from within the database.
The exception being images and other media for which it is faster (and cheaper in terms of server load) to store them as files rather than in the database. The trade off being that this opens the door for bandwidth bandits that steal bandwidth by linking directly to those files instead of hosting them themselves.
How do we avoid this? Well, the truth is you can do it with four very simple lines added to your .htaccess file.
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?stevesgeekspeak\.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule \.(js|css|jpe?g|gif|png|mp4|3gp|m4a|m4r|aac|mp3|ogg)$ - [F]
Lets break these down so they make sense.
- Line 1
- mod_rewrite needs to be enabled for this to work. This first line just tells Apache to turn it on if it isn’t already. There are Apache commands you can use to check if the module is installed before you run these commands, but I’m going to assume you have it already. Essentially all hosting sites will include it as a core part of their hosting plans.
- Line 2
- This is the first check. In this case we make sure that the referring site is my own. The !^ part says “if it doesn’t begin with…” while the (.+\.)? says “any or none” in reference to any possible subdomains. Then we have the domain name itself. So (.+\.)?stevesgeekspeak\.com could match www.stevesgeekspeak.com, or something.stevesgeekspeak.com or nothing.stevesgeekspeak.com. The /.*$ says any possible page on that site can be a referrer. Finally, the [NC] says “No Case” or ignore the case. So someone might have entered in WWW.STEVESGEEKSPEAK.COM rather than www.stevesgeekspeak.com. Its still this site, so we don’t want to penalise just because they had Caps Lock on.
- Line 3
- This is a very simple one. This basically says “No Empty Referrer”. People that write any sort of code may be familiar with the ! in various languages. In its most basic definition it is a boolean operator that simply means “not” or “no” to the processor. Because we can’t determine whether the person is on our site or not when there is no referrer provided to us by the browser, it is generally good form to just allow those browsers. Yes, it will mean some people will still see images hotlinked from your website, but very few. It is a setting that is not trivial to alter in most browsers by default and turning it off will break a lot of sites that people find useful (such as banking sites.)
- Line 4
- This is the command that tells the server what to do if the above lines are true. So if the referrer line is not empty and it does not contain stevesgeekspeak.com as the domain of the referral string, this is where the magic happens.
If the person is trying to access an .js, .css, .jpg or jpeg, gif, png, mp4, 3gp, m4a, m4r, aac, mp3 or ogg file, this commnd says [F]orbidden and returns a 403 Forbidden HTTP response. You will notice that there is a jpe?g in there. The ? essentially tells mod_rewrite that the e is optional. If we wanted to extend that to more than just the e, we’d wrap the multiple characters inside parentheses () to mark them as a set.
Now in this case, I have mixed all file types in to a single command regardless of what type of file they are. However, it is also possible to redirect all the images to another image if you so choose. Simply replace line 4 with the following.
RewriteRule \.(jpe?g|gif|png|bmp)$ http://stevesgeekspeak.com/images/Attn.gif [L]
In this instance, we are redirecting all images to the file Attn.gif. It is important to note that you can only do this by linking to another image. You cannot link to a web page for example. The [L] is merely instructing mod_rewrite to link in the designated file in place of the original. It can be hosted anywhere on the internet. It doesn’t have to be on your own server. However, I strongly urge you to only use files you host and own on your own website. Don’t steal someone elses bandwidth. That would make you just as bad as those people we’re trying to stop with these rules.
There are some amazing and wonderful things you can do with mod_rewrite. This is only one very specific command that doesn’t even scratch the surface of what it can do. For example, try going to www.stevesgeekspeak.com and see what mod_rewrite does to the URL. On one of my sites, my .htaccess is nearly 100 lines long with some of the rules it has in place for mod_rewrite and other Apache modules. It is a very powerful tool for web masters and web site administrators.
I hope this is useful to you. Let me know if you’d like me to clarify or help with anything you see here. I’ll do my best to help you with any queries you might have.
| This entry was posted by Steve on 13 September, 2011 at 9:13 pm, and is filed under Code, Tutorials. Follow any responses to this post through RSS 2.0. Both comments and pings are currently closed. |
-
Dave
-
http://www.google.com/profiles/113064402350615967581 kinthiri
-
Wez
