Getting web visits data for analytics with Javascript, Python and GeoIP

data-processing geoip javascript python web-analytics

Requirements

% apt-get install -y geoip-database
% pip install python-dateutil pygeoip

Browser data

We are going to parse the navigator data to obtain the language, browser, operative system, screen resolution, density of pixels of the screen and a boolean to define if the user is using a touch device. This data will be POSTed to be analyzed and stored (in this example in the path '/analytics').

The Javascript code:

var data, ua, platform, browser, os;

data = {
    page: window.location.pathname.replace(/\/$/, ''),
    language: navigator.language.slice(0, 2).toUpperCase(),
    screen_resolution: window.screen.width + 'x' + window.screen.height,
    screen_dpi: window.devicePixelRatio,
    is_touch_device: 'ontouchstart' in window || 'onmsgesturechange' in window
};
ua = navigator.userAgent.toLowerCase();
platform = navigator.platform.toLowerCase();

browser = ua.match(/(opera|chrome|safari|firefox|msie|trident)\/?\s*([\d\.]+)/) || [];
if (/trident/.test(browser[1])) browser = 'ie';
else if (browser[2]) browser = browser[1];
else if (! browser.length) browser = 'other';
if (browser == 'msie') browser = 'ie';
data.browser = browser.toLowerCase();

if (ua.match(/android/)) os = 'android';
else if (ua.match(/(iphone|ipod|ipad)/)) os = 'ios';
else if (ua.match(/windows phone/)) os = 'windows-phone';
else if (ua.match(/blackberry/)) os = 'blackberry';
else if (platform.match(/mac/)) os = 'mac';
else if (platform.match(/win/)) os = 'windows';
else if (platform.match(/linux/)) os = 'linux';
else os = 'other';
data.os = os;

$.ajax({
    url: '/analytics',
    data: JSON.stringify(data),
    type: 'POST'
});

Server-side geo data

I'm going to use Tornado and Cassandra, the code is pretty straightforward so you can easily extrapolate it to your preferences (let's say Django and PostgreSQL, as an example).

You will need to download the GeoLite City database in your application data root (defined as settings.DATA_ROOT).

The Python code:

import os
import json

from tornado.web import RequestHandler

from pygeoip import GeoIP
import dateutil.parser

from app.models import Visit
from app import settings


class VisitController(RequestHandler):

    def post(self):
        self.set_status(204)
        data = json.loads(self.request.body)

        ip = self.request.remote_ip
        if ip:
            data['ip'] = ip
        else:
            return self.finish()

        geo_db = GeoIP(os.path.join(settings.DATA_ROOT, 'GeoLiteCity.dat'))
        try:
            city = geo_db.record_by_addr(ip)
        except:
            return self.finish()
        else:
            data.update({'city': city['city'],
                         'country_code': city['country_code'],
                         'country_name': city['country_name'],
                         'continent': city['continent']})

        data['date'] = dateutil.parser.parse(data['date'])

        try:
            Visit.create(**data)
        except:
            pass

        return self.finish()