Converting Server Logs to GeoIP data (kml) – (1)
This is part one of a multi-part part post on generating and using KML data from server logs. I started down this path after taking a look at Snoge (a Snort & Google Earth mashup.) I used code structure (via Bash) similar to the Perl code used by Snoge.
There are multiple uses for reviewing the region/country/city (GeoIP data) from which your servers and systems are being accessed – could be you are just curious, could be a real need (i.e. security), could be part of your company marketing research. Seems like this would have already been done as a system level solution (i.e. without Perl modules or some other non-native shell solution; probably has been done – I just did not find it…) So, what do we need?
- a reference Geo IP data set (hopefully something recent; you will extract the Geo-Location from the data)
- server logs: web servers, firewalls, system logs – any log with unique IP data
- tools to convert from your selected log type into a mappable? format (in this case, ‘.kml‘, aka Google Maps or Google Earth)
A little background
There are numerous tools and data sets (free, Open Source, commercial, SaS, etc.) for converting IP addresses into Geo IP data (i.e. data and APIs from MaxMind.com.) In this case (continuing recent posts on this topic) I will continue using the tools and data from MaxMind.com.
KML - the XML file format used by Google Maps and Google Earth. If you use Google Maps then you will need both a Google account and a Maps API Key; in addition your ‘.kml’ files/data may need to reside on a public web server (so don’t use Google Maps for ‘private’ data – use Google Earth instead – at least I think that’s private…)
Server Log Data
Naturally every log type will have it’s own unique format so we will need to extract useful information from the logs based on the log type. In this case we will fetch:
- IP address
- date & time (in this case the date/time of log extraction will be used)
- could be simple categories or something more specific like resource types (perhaps URLs for web logs, event type for other logs)
Extract & Generate Tools needed
- a tool for each extraction type (or a module/function for each extraction type called within one tool)
- a tool to generate the KML data points or KML line data or both (or something else)
The three step process (very simple):
- extract log data,
- determine GeoIP points for each IP, and
- generate desired KML file(s)
Ok – looking at some data
Note – for this example a ‘refined’ or ‘fine grained’ extraction will not be used – simply build a list of unique IP addresses to work with… Firewall log entries like:
Apr 20 23:36:05 Tue Apr 20 23:36:09 2010 FIREWALL System Log: Blocked incoming TCP connection request from 222.208.YYY.ZZZ:12200 to AAA.BBB.CC.DD:8000
Ok, the line above has been edited to somewhat obscure the real IPs… One way to extract this type of information is to use a tool like ‘grep’, i.e.
grep "Blocked incoming" /var/log/messages | tail | awk '{print $18"}' | awk -F ":" '/^[0-9]/ {print $1}' |sort |uniq
The above says, “generate a list of uniq list of IP addresses where a connection was blocked”:
- look for the pattern, “Blocked incoming” in the file: /var/log/messages
- show the last ten lines that match
- send to the ‘awk’ program and just print the 18th field
- send to the ‘awk’ program and ’split’ the line at the ‘:’ and require that the line start with a number
- sort the list to a uniq list (no repeated entries)
Web server logs are simpler – just extract the IP address field from log files – in the example below the first field happens to be what we need (see ‘awk’ example below with GeoIP data.)
==== Below, an excerpt (modified and reduced) from web server logs)
24.171.0.0 - - [21/Apr/2010:14:03:42 -0400] "GET /w 24.94.0.0 - - [21/Apr/2010:03:01:58 -0400] "GET /200 35.10.0.0 - - [21/Apr/2010:10:57:58 -0400] "GET /20 65.220.0.0 - - [21/Apr/2010:14:51:44 -0400] "GET /2009 66.163.0.0 - - [21/Apr/2010:10:10:14 -0400] "HEAD /2 66.163.0.0 - - [21/Apr/2010:10:10:15 -0400] "GET /20 75.111.0.0 - - [21/Apr/2010:16:05:19 -0400] "GET /200 79.110.0.0 - - [21/Apr/2010:12:13:47 -0400] "GET /201 80.96.0.0 - - [21/Apr/2010:08:38:35 -0400] "GET /20 83.149.0.0 - - [21/Apr/2010:10:18:39 -0400] "GET /201 87.189.0.0 - - [21/Apr/2010:04:07:18 -0400] "GET /2 87.65.0.0 - - [21/Apr/2010:08:33:29 -0400] "GET /2008 88.81.0.0 - - [21/Apr/2010:08:21:23 -0400] "GET /20 91.100.0.0 - - [21/Apr/2010:11:43:03 -0400] "GET /r 91.100.0.0 - - [21/Apr/2010:11:43:04 -0400] "GET /2 96.18.0.0 - - [21/Apr/2010:02:15:15 -0400] "GET /wta 99.68.0.0 - - [21/Apr/2010:07:56:44 -0400] "GET /20 119.113.0.0 - - [21/Apr/2010:03:23:55 -0400] "GET / 121.126.0.0 - - [21/Apr/2010:01:50:05 -0400] "GET /2 121.243.0.0 - - [21/Apr/2010:05:03:15 -0400] "GET /2 123.11.0.0 - - [21/Apr/2010:11:49:07 -0400] "GET /20 129.198.0.0 - - [21/Apr/2010:10:37:13 -0400] "GET /w 130.113.0.0 - - [21/Apr/2010:03:34:44 -0400] "GET / 173.12.0.0 - - [21/Apr/2010:13:30:07 -0400] "GET /20 174.52.0.0 - - [21/Apr/2010:09:48:58 -0400] "GET /2 200.41.0.0 - - [21/Apr/2010:10:32:37 -0400] "GET /20 212.118.0.0 - - [21/Apr/2010:16:56:35 -0400] "GET / 212.235.0.0 - - [21/Apr/2010:01:33:44 -0400] "GET /20 214.27.0.0 - - [21/Apr/2010:01:53:43 -0400] "GET /200
=== GeoIP data for the IP addresses above (modified IPs…)
awk '{print $1}' some_web_log_file |sort -n | uniq | xargs -n 1 geoiplookup -f /usr/local/share/GeoIP/GeoLiteCity.dat
(The output below was generated to include the IP address...) 24.171.0.0 | US, N/A, N/A, N/A, 38.000000, -97.000000, 0, 0 24.94.0.0 | US, HI, Honolulu, N/A, 21.313900, -157.824493, 744, 808 35.10.0.0 | US, MI, East Lansing, 48824, 42.728298, -84.488297, 551, 517 65.220.0.0 | US, NC, Salisbury, 28147, 35.687199, -80.566902, 517, 704 66.163.0.0 | US, CA, Sunnyvale, 94089, 37.424900, -122.007401, 807, 408 75.111.0.0 | US, NC, Greenville, 27858, 35.523499, -77.300797, 545, 252 79.110.0.0 | PL, 72, Wroclaw, N/A, 51.099998, 17.033300, 0, 0 80.96.0.0 | RO, 34, Suceava, N/A, 47.633301, 26.250000, 0, 0 83.149.0.0 | NL, 07, Amsterdam, N/A, 52.349998, 4.916700, 0, 0 87.189.0.0 | DE, 07, Ratingen, N/A, 51.299999, 6.850000, 0, 0 87.65.0.0 | BE, 11, Brussels, N/A, 50.833302, 4.333300, 0, 0 88.81.0.0 | UA, 13, Kiev, N/A, 50.433300, 30.516701, 0, 0 91.100.0.0 | DK, 06, Copenhagen, N/A, 55.666698, 12.583300, 0, 0 96.18.0.0 | US, OR, Ontario, 97914, 44.086102, -117.018799, 757, 541 99.68.0.0 | US, TX, Mckinney, 75070, 33.176899, -96.698502, 623, 972 119.113.0.0 | CN, 19, Dalian, N/A, 38.912201, 121.602203, 0, 0 121.126.0.0 | KR, 11, Seoul, N/A, 37.566399, 126.999702, 0, 0 121.243.0.0 | IN, 02, Hyderabad, N/A, 17.375299, 78.474403, 0, 0 123.11.0.0 | CN, 09, Zhengzhou, N/A, 34.683601, 113.532501, 0, 0 129.198.0.0 | US, CA, Edwards, N/A, 34.961601, -117.875900, 803, 661 130.113.0.0 | CA, ON, Hamilton, l8s4m1, 43.250000, -79.833298, 0, 0 173.12.0.0 | US, OR, Beaverton, N/A, 45.480400, -122.835602, 820, 503 174.52.0.0 | US, UT, Roy, 84067, 41.178699, -112.049301, 770, 801 200.41.0.0 | AR, 07, Buenos Aires, N/A, -34.587502, -58.672501, 0, 0 212.118.0.0 | SA, 10, Riyadh, N/A, 24.640800, 46.772800, 0, 0 212.235.0.0 | IL, 04, Qiryat Atta, N/A, 32.800598, 35.106400, 0, 0 214.27.0.0 | US, AE, Apo, N/A, 0.000000, 0.000000, 0, 323
=== ** NOTE – don’t use any GeoIP data on this post for any form of verification/validation! Your results will vary from these examples.
Part two of this post will include the KML portion of this process.
Related posts:
- Converting Server Logs to GeoIP data (kml) – (2) This is part two of a multi-part post on generating...
- Converting Server Logs to GeoIP data (kml) – (3) This is part three of a multi-part post on generating...
- GeoIP and Php – simple examples This is a brief/simple example of using PHP with the...
- GeoIP Blocking – examples for Apache The GOOD news – using the GeoIP module (mod_geoip.c.) can...
- Personal Data collection – 24×7x365 Some recent events illustrate current, seemingly innocuous examples of personal...