Simple web analytics with Python and Pandas

data-analysis pandas python web-analytics

We are going to do some analytics with our web visits data. As a simple report we are going to obtain the unique and total visits respect the date and many other paramenters like browser, page wisited, language, operative system...

Requirements

% pip install python-dateutil pandas

Getting and filtering the data

Let's assume the structure of our data like:

uuid ip city country_code country_name language browser os page date ...
ea2d3169-2b71-4beb-9665-108d302c3a67 78.146.232.107 London UK United Kingdom EN Firefox Linux /foo 2015-02-12 09:25:17.770175 ...
bdb18e99-fc80-4d4b-b4a1-286e67ba374f 95.142.167.120 Paris FR France FR Safari Mac /bar 2015-02-09 21:11:02.134322 ...
... ... ... ... ... ... ... ... ... ... ...

The python code:

from datetime import datetime, timedelta
import pandas as pd


source = pd.read_csv('data.csv', index_col='uuid', parse_dates=['date'])

date_1 = datetime.utcnow()
date_0 = date_1 - timedelta(days=30)

data = source[(source['date'] > date_0) & (source['date'] < date_1)]

Aggregating the data

We'll obtain data structured as:

unique_visits total_visits
2015-01-13 90 140
2015-01-14 104 170
2015-01-15 80 193
... ... ...
unique_visits total_visits
Linux 76 111
Mac 101 180
Windows 40 73

The Python code:

def get_visits(groupby):
    ip_visits = data.groupby(groupby)['ip']
    return pd.DataFrame(
        {'unique_visits': ip_visits.apply(lambda x: len(set(x))),
         'total_visits': ip_visits.apply(len)})

data['date_day'] = data['date'].apply(datetime.date)
visits_by_date = get_visits('date_day')
# Redefine with date index to avoid lack of dates.
visits_by_date = pd.DataFrame(visits_by_date,
                              index=pd.date_range(date_0, date_1)).fillna(0)
# Rest of filtered visits.
visits_by_os = get_visits('os')
visits_by_city = get_visits('city')