Alan Doherty [Rated By ICRA] Level Double-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0
Valid CSS! Valid HTML 4.01 Strict

Alan Doherty's Webserver Administration Tips

As well as the other stuff I do, I admin this very webserver {as a live test envirnment) and many other critical ones for clients.

The lessons learned from this, and more importantly the attempted abuses/attacks observed daily {reason for running a live-test environment}, have given me many oportunities for insight into the mindset/tactics of the attackers/abusers, and methods to counter/thwart and otherwise frustrate their attempst

Some of these tactics are apache specific, my webserver of choice, but most are generally applicable and usefull in any http server

So this section of the website I will use to share these with you the public, but as always, for a more comprehensive check and secure system please contact myself and arrange an audit to have it done proffessionally

General Truths of HTTP

Limit your attack surface

To make any system as secure as possible, load/run only the bare minimum modules apache needs to serve your site(s), as less features/code less potential undiscovered {as yet} bugs and even less potentially exploitable ones.

In IIS terms disable/uninstall all unused features

To further limit your attackable surface rid your server of DUST {different URLs with similar text}

This alone will shield you from discovery by 90% of scan attacks, save bandwith, improve search ranking, and avoid ranking sabotage

make use of name-based virtual hosts, ie. if your site(s) are supposed to be available to users via http://www.example.com/ and http://example.com/ {running on a server with an IP of 19x.120.12x.7 that also has the hostname of webserver.example.com and possibly others}

You would run a minimum of 3 virtual-hosts best practice is 4

To avoid scans ensure people cannot get to your server via http://19x.120.12x.7/ and ensure this url's 404 page dosn't re-direct {even in a link} to your real sites name as it just ensures scanners find out how to bypass your security

There are many advantages to using name-based-virtual-servers properly apart from security, removing DUST also stops PR loss due to duplicate content {side note: this linked articles solutions are pointed at web-developers not server-admin's good server-admin work saves time hassle and processor power by doing this for them}

Related point it also stops the possiblity for ranking sabotage, when a competitor finds you havn't implemented content on a single name, so they maliciously link to {and sometimes even register } alternative names that point to your servers IP to ensure search engines see many many "copies" of your content on many pages thus diluting the "originality" of the content in any ranking

NB. the above PR related issues are not google specific they are universal to any/all search engines

Catch/Ban malicious bots/harvesters ASAP

Simple truth they are sucking your processor cycles/bandwith attempting to exploit your servers/content/users so why would you not want to defend yourself

Many systems for this are available for many arcitectures, I have written one in PHP for mounting on apache, once a malicious-bot is detected {in my case when they access a URL explicitly denied in the robots.txt for the site that is not visably linked to anywhere on the site}, they are "banned" from the server {all the urls are known vulnerabilities commonly probed for by bots that are aliases for my "ban-me" script}, additionally to foil address harvesters webmasters are encouraged to have the first and last link on each page "hidden" and pointed at these URL's {thus no users see or click on them, no search engines follow them either due to the robots.txt, but malicious bots do see and folow them and get self-blacklisted as a response}

{that said there is a longstanding issue wherby "Google Wireless Transcoder" reveals them to all its users, this bug has been repeatedly reported, silence is the answer so most of us say "let their users ban themselves" and maybe they will hear/listen to their complaints}

Ensure non-malicious bots are instructed about your defences

Google and every other search engine are your friends ;), so ensure your robots.txt tells them where not to go on your site, if site content is someone elses departmentensure the robots.txt is generated dynamically from server-wide and sites-own preferances {don't let a botched robots.txt re-write result in google getting unwittingly banned from your server}

Ensure malicious IP's are not permanently banned

Infected pc's IP's will be re-used by others, permanent bans do no-one any good, given long enough you will ban the entire internet, best to either:

SEO and DUST related optimisations

php {and othere} consume more resources

how much more?

differs from script to script page to page but on average php == 10*shtml shtml == 10*html thus a single static html page can have 100 times the simultanious viewers, which means 500 rather than 5 on a poor server, or 500000 instead of 5000 on a more decent one either way no small difference

Use the best tool for the job but don't tie yourself down

design your sitemap and url structure to be type independant ie. use http://www.example.com/the-page/ not http://www.example.com/the-page.htm or http://www.example.com/the-page/index.htm

As the url http://www.example.com/the-page/ will work to access http://www.example.com/the-page/index.htm http://site.domain/the-page/index.php http://www.example.com/the-page/index.asp http://www.example.com/the-page/index.pl http://www.example.com/the-page/index.shtm or any other type on any other server if your needs/solutions/hosting change

To achieve this you just ensure all internal links are to the directories that your differnt index pages reside in, if old links point to http://your.site/index.htm then ensure all your internal links point to http://your.site/ or / and then re-name index.htm to index.html {for php index.php to index.php5} and then redirect with status 301 all attempts for /index.htm to http://your.site/ so all old-links work but users {and search engines} see the old url is also wrong and adjust

Pick the best names for your page uri's

Stick to lower case as no one wants to change case mid typing, regardless of the filesystem {some are case sensitive some are not} mixing case is fairly obnoxious and adds to the diffuculty of users relaying url's non-electronicly {via phone/mail/fax/sms}

Use meaningfull words and punctuation in urls The URL http://www.example.com/red-car/ is more useful to everyone than http://www.example.com/redcar/ I recommend that you use hyphens (-) instead of underscores (_) in your URLs {as _ is unsupported by some filesystems and few simpler users}

If any content is dynamic in nature avoid the possibility of "infinite spaces" by using robots.txt and/or rel=nofollow to stop robots continually hitting next on a calendar to get another empty page far into the future

simmilarilly avoid multiple url's to the same content by allowing multiple views via url parameters or session id's in the url{use cookies insead}, consider using robots.txt and/or rel=nofollow to exclude the refined-views urls and just let search engines see the full/expande/default view only, additionally offer users a [link] button or code that gives them the one 'true' url to the content, for bookmarking and linking

Obviously using static html where possible avoids most of these dynamic-content issues


Last updated Dec. 2008 Alan Doherty