The importance of checking website server logs

Very rarely do I find the time to check the raw access logs for my websites. This is mainly because I’ve got Google Analytics installed and since they provide pretty much all of the information I require, the need to check my server logs is not really necessary.

However, Google Analytics does not show you which search engine bots has been crawling your website because the bots do not execute javascript which analytics uses to track visitors to the site. I had to download my server logs and check whether a particular webpage has been crawled. I did a search for “Googlebot” and got an audit trail of which pages have been crawled for the past month or so but I also noticed a completely different IP for Googlebot. I know the standard IP range starts with 66 (eg 66.249.65.49) but this was one 114.152.179.5. So I did a reverse DNS check on that IP and guess what, the IP which claimed to be googlebot is not googlebot at all. It’s for someone from japan who’s done a program to leech content off my website.

The server logs revealed that it has requested a significant amount of pages and I thought that google was really loving my website and decided to do a deep crawl but in reality, somebody was stealing content off my site. You can guess my frustration! I have now blocked the IP address through htaccess but I will be checking my raw access logs more often in an attempt to ban future leechers. If you’re a leecher and reading this, beware because I’m on to you now!

1 Response

  1. bro mak April 17, 2011 / 6:58 pm

    I find at least three general motivations for using website server logs:
    * Security
    * Progress & Procedural issues.
    * Promotional

    As it relates to security, any business or work related entity that actually completes work via the internet, has areas and means of access that are or should be protected. For example, if you have form to mail scripts on your website, illegal mail hackers may try to access your mail script, that runs server side, and use it to send out their vile commercials, advertisement, solicitations and immoral communiques to their spam lists, getting YOU in trouble with the law, locking up your mail() script, perhaps cracking your own mail server database via an sql attack and sending out filth to your friends and clients. Then there are issues of them accessing intellectual property that is securely locked away on pages that are not available to the public, and attempting to hack and crack your access schemata gaining access to your database server, and all your marketting, clientelle, and inner workings, communication and business plans. It is feasible and has been accomplished: attacks on ports of security entries to banks, government agencies and troves of personal data, like credit cards, bank accounts and etc… to be launched remotely from databases and server ports on a business website’s internet site, and thus to get away with theft, using a web server whose weblog is being ignored.

    As it relates to priorities and procedural issues, everyone who has a website, has pages that need attention and those that actually are productive in accomplishing a goal or priority of the website. MUCH can be discerned by looking at what pages, and what content are being read, and for how long and the subsequent actions taken by the visitor who is reviewing the material. At a website that is sharing data regarding a series of medical conditions, one can discern that the big page, may be putting readers to sleep in about 35 seconds, causing them to click away without gaining the full gravity of the data. Whereas another page, with a few demonstrative illustrations, shorter in text but more clear in the illustration and application visually, may keep visitors there scrolling and reading the data, and gaining the full import and impact of the data. Simple review of what keeps and gains attention or spurs further action reveals WHAT needs more attention, WHAT needs less attention and may give the webmaster/development team an idea of what the “next priority” in developing the site into a more useful tool for accomplishing the goals they are trying to reach.

    This leads to the third area I think is helped by analysis of the server logs. You find out real quick what people are looking for and where YOU can develop further information, offer a product or provide a service that will make their visit a greater benefit to them, and promote the very reason you created the website to begin with. The search terms they enter show up in the referral query section of the HTTP header request, regardless of the search engine, or web directory or link from a friendly website. You find out what the culture is using as their way of querying YOUR site. What gets them there, what they are looking for and get a sense of whether or not they found any success. If they are searching for knowledge on where to buy a widget-1 item, and spend 3 seconds reading your widget-1 page with no order, and disappearing to never return again… maybe we need to work on our widget-1 data and conversion strategy. We want them to get what they want and come back for more and send their friends and family and co-workders to “oursite.com” because we made “it” happen for them.

    So there you have a quick summation of what are only THREE of many useful and important aspects of the primary benefits and importance of weblogs to any website owner, webmaster, or website content developer.

Comments are closed.

comments powered by Disqus