Collect pageviews with Flask and Cassandra
cassandra data-warehousing flask python web-analyticsHere is a simple example of collecting pageviews using Flask and Cassandra. The correct way from the client side to make a cross-site request to save a pageview is using CORS, but since old browsers don't support CORS we will request a light image with some browser arguments.
Python requirements:
Flask==0.11
cassandra-driver==3.5.0
pytz==2016.4
Create a Cassandra keyspace:
CREATE KEYSPACE wa
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE wa;
To support multiple applications, create the table apps
:
CREATE TABLE apps (
id uuid PRIMARY KEY,
name text,
url text
);
The pageviews will be saved in a composite-keyed table with app
as a partition key, thereby pageviews for the same app will be stored physically together. Create the composite-keyed table pageviews
:
CREATE TABLE pageviews (
app uuid,
date timestamp,
utma uuid,
utmb uuid,
path text,
title text,
ip text,
referrer text,
useragent text,
platform text,
language text,
screensize text,
pixelratio float,
PRIMARY KEY (app, date)
);
Inside the <header>
of each page to collect pageviews add the following script, editing the APP_ID
to match to the database and the ANALYTICS_URL
where the Flask app is running (also it is a good idea to minimize the script):
<script type="text/javascript">
var d,i,q,x;
d = {
app: '{{APP_UUID}}',
path: location.pathname,
title: document.title,
platform: navigator.platform,
language: navigator.language,
screensize: screen.width+'x'+screen.height,
pixelratio: devicePixelRatio,
referrer: document.referrer
};
q = [];
for (i in d) q.push([i,encodeURIComponent(d[i])].join('='));
new Image().src = '{{ANALYTICS_URL}}?'+q.join('&');
</script>
Now, in Flask, save the pageviews. Also, we are using the cookies _utma
and _utmb
in the same way Google Analytics does (more here): _utma
is used to "remember" a user (expires in two years) and "_utmb" is used to record the visit duration (expires in 30 minutes):
from datetime import datetime, timedelta
from uuid import UUID, uuid4
from flask import Flask, request, send_file
from cassandra.cluster import Cluster
import pytz
app = Flask(__name__)
app.config.from_pyfile('config.py')
@app.before_request
def before_request():
app.cluster = Cluster()
app.db = app.cluster.connect('wa')
@app.teardown_request
def teardown_request(exception):
app.cluster.shutdown()
@app.route('/')
def pageview():
data = request.args.to_dict()
response = send_file('img.gif', mimetype='image/gif')
# Verify app.
try:
data['app'] = UUID(data['app'])
except ValueError:
return response
query = 'SELECT id FROM apps WHERE id=%s'
if not list(app.db.execute(query, [data['app']])):
return response
# Tracking cookies.
now = datetime.now(pytz.timezone('Europe/London'))
if '_utma' in request.cookies:
utma = UUID(request.cookies['_utma'])
else:
utma = uuid4()
response.set_cookie('_utma', str(utma), expires=now+timedelta(days=730))
if '_utmb' in request.cookies:
utmb = UUID(request.cookies['_utmb'])
else:
utmb = uuid4()
response.set_cookie(
'_utmb', str(utmb), expires=now+timedelta(seconds=1800))
# Save pageview.
data.update(utma=utma,
utmb=utmb,
date=now,
ip=request.remote_addr,
useragent=request.headers['User-Agent'],
pixelratio=float(data.get('pixelratio') or 1))
query = 'INSERT INTO pageviews (%s) VALUES (%s)' % (
','.join(data.keys()), ','.join(['%s']*len(data)))
app.db.execute(query, data.values())
# Prevent HTTP caching.
response.headers['Last-Modified'] = now
response.headers['Cache-Control'] = 'no-cache, no-store, must-revalidate'
response.headers['Pragma'] = 'no-cache'
response.headers['Expires'] = '0'
return response
if __name__ == '__main__':
app.run()
Remember to add the img.gif
and config.py
file with extra settings (like SERVER_NAME
).