Parsing Apache Logs

cat access.log | awk ‘{print $1 ” ” $4}’

Below are some simple awk/sed/etc command line scripts to parse apache logs and get quick statistics

Unique visitors per day

Where access.log is your combined log file with typical format as below:

access.log
69.175.xxx.yyy – – [13/Jul/2013:06:28:31 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.0” 404 4212 “http://somesubdomain.example.org/some/web/folder/some_web_page1” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”
69.175.xxx.yyy – – [13/Jul/2013:06:28:35 -0500] “GET /some/web/folder/?do=register HTTP/1.0” 302 599 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”
69.175.xxx.yyy – – [13/Jul/2013:06:28:36 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.0” 404 4212 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”
69.175.xxx.yyy – – [13/Jul/2013:06:28:41 -0500] “POST /some/web/folder/somewebpage2 HTTP/1.0” 200 2439 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1”

94.123.xxx.yyy – – [13/Jul/2013:06:32:49 -0500] “GET /some/web/folder/some_web_page1 HTTP/1.1” 200 5121 “http://somesubdomain.example.org/” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:49 -0500] “GET /some/web/folder/?do=login HTTP/1.1” 302 599 “http://somesubdomain.example.org/some/web/folder/some_web_page1” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:50 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.1” 404 4214 “http://somesubdomain.example.org/some/web/folder/some_web_page1” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:50 -0500] “GET /some/web/folder/?do=register HTTP/1.1” 302 599 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:50 -0500] “GET /some/web/folder/somewebpage2 HTTP/1.1” 404 4214 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0”
94.123.xxx.yyy – – [13/Jul/2013:06:32:51 -0500] “POST /some/web/folder/somewebpage2 HTTP/1.1” 200 2530 “http://somesubdomain.example.org/some/web/folder/somewebpage2” “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0″
Below is the command line script. This gets the unique hits per day. There is a grep at the very end to do a final filter for the Month and Year you may be looking for.

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ | \
sort | uniq | awk ‘{print $1}’ | uniq -c | grep “Feb/2013″
The output is as below

52 01/Feb/2013
63 02/Feb/2013
47 03/Feb/2013
62 04/Feb/2013
59 05/Feb/2013
63 06/Feb/2013

etc.
Explanation of the command

This may be useful if you want to tweak it.

Break down 1

Get IP and date (unformatted at this stage)

cat access.log | awk ‘{print $1 ” ” $4}’
Output of the above

69.175.xxx.yyy [13/Jul/2013:06:28:31
69.175.xxx.yyy [13/Jul/2013:06:28:35
69.175.xxx.yyy [13/Jul/2013:06:28:36
69.175.xxx.yyy [13/Jul/2013:06:28:41
94.123.xxx.yyy [13/Jul/2013:06:32:49
94.123.xxx.yyy [13/Jul/2013:06:32:49
94.123.xxx.yyy [13/Jul/2013:06:32:50
94.123.xxx.yyy [13/Jul/2013:06:32:50
94.123.xxx.yyy [13/Jul/2013:06:32:50
94.123.xxx.yyy [13/Jul/2013:06:32:51
Break down 2

Remove the [ bracket with sed. Remove the time portion of the output with cut.

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1
Output of the above

69.175.xxx.yyy 13/Jul/2013
69.175.xxx.yyy 13/Jul/2013
69.175.xxx.yyy 13/Jul/2013
69.175.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
94.123.xxx.yyy 13/Jul/2013
Break down 3

Swap IP and Date such that Date is 1st and IP is 2nd

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ |
Output of the above

13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
Break down 4

Remove duplicate IPs – to get unique IP hits per day

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ | \
sort | uniq
Output of the above

13/Jul/2013 69.175.xxx.yyy
13/Jul/2013 94.123.xxx.yyy
Break down 5

Now the IPs are no longer interesting as we only need their count. So remove IP by printing only date. Then do a uniq -c on the output to get the counts for the dates

cat access.log | awk ‘{print $1 ” ” $4}’ | sed ‘s/\[//’ | cut -d”:” -f1 | awk ‘{print $2 ” ” $1}’ | \
sort | uniq | awk ‘{print $1}’ | uniq -c
Output of the above. So there are only two unique hits in our example for the one date.

http://tech.snathan.org/tech/apache/log_parsing

2 13/Jul/2013

Deixe um comentário