Sunday, October 27, 2013

I think 12 Years A Slave isn't about slavery

It's about unjust socioeconomic systems and our multifaceted complicity in them. The vehicle is a man's journey through the hell of chattel slavery, but the man as well as the institution are just a vehicle. The stars are all the different ways that unjust institutions maintain the power we give them. The film does both the journey and the theme justice with a minimum of false redemption.

The are many powerful images and actions that are specific to slavery, but characters responses to those events could be taken right out of the film and be at home in any American period, including our own. I hope that people see that and keep it in mind while watching.

Also, it's beautifully shot, scored and directed. A technical masterpiece at the least.

Wednesday, March 6, 2013

Testing Django against Multiple Python Versions with Travis

Django 1.5 has experimental support for Python 3. This is great news, and means that we should all be upgrading our Django apps to be able to support both Python 2 and 3. Django has some documentation on how to go about this process. I recently went through this for one of my apps and, while the changes to the app itself were easy enough, I had to figure some things out myself for my testing setup.

I like to use Travis for continuous integration.

  • It's easy to set up (I usually just copy a .travis.yaml file from another project and tweak a couple settings)
  • It integrates really well with GitHub, even automatically running tests against people's pull requests

The .travis.yml for my projects usually looks something like this:

language: python
python:
  - "2.6"
  - "2.7"

env:
  - DJANGO="Django==1.3"
  - DJANGO="Django==1.4"
  - DJANGO="git+git://github.com/django/django.git@master#egg=django"

# command to install dependencies
install: pip install -r requirements.txt

# command to run tests
script: python setup.py test

Then I'll list my requirements in requirements.txt, and put a test command in setup.py. Travis, when given this configuration, will run tests in six different environments:

  • Python 2.6 with Django 1.3
  • Python 2.6 with Django 1.4
  • Python 2.6 with Django's trunk
  • Python 2.7 with Django 1.3
  • Python 2.7 with Django 1.4
  • Python 2.7 with Django's trunk

Up until now, this has served me well.

With the introduction of Python 3 support, I introduce an additional Python environment for my tests:

python:
  - "2.6"
  - "2.7"
  - "3.2"

Now Travis will run tests in 9 environments. However, in some of those environments, the tests will fail, even after I've done the work to make my app Python 3 compatible. So what happened?

The environments that Travis sets up are:

  • Python 2.6 with Django 1.3
  • Python 2.6 with Django 1.4
  • Python 2.6 with Django's trunk
  • Python 2.7 with Django 1.3
  • Python 2.7 with Django 1.4
  • Python 2.7 with Django's trunk
  • Python 3.2 with Django 1.3
  • Python 3.2 with Django 1.4
  • Python 3.2 with Django's trunk

But notice that, even though my app may be compatible with Python 3, Django 1.3 and 1.4 are not. So, these environments will produce errors when even trying to install Django.

My solution to this is to introduce a couple of checks in my install and test running scripts. First, I have to make my Travis configuration a little smarter. I use a separate install script that checks whether the Django version is compatible with the Python version, and immediately exits if not:

.travis.yml:

language: python
python:
  - "2.6"
  - "2.7"
  - "3.2"

env:
  - DJANGO="Django==1.3"
  - DJANGO="Django==1.4"
  - DJANGO="Django==1.5"
  - DJANGO="git+git://github.com/django/django.git@master#egg=django"

# command to install dependencies
install: ci/install.sh

# command to run tests
script: python setup.py test

ci/install.sh:

#!/bin/bash

PYTHON_VERSION=$(python --version 2>&1)

if [[ "$PYTHON_VERSION" > "Python 3" ]]
then
  if [[ "$DJANGO" < "Django==1.5" ]]
  then
    echo "Cannot install $DJANGO on $PYTHON_VERSION"
    exit
  fi
fi

pip install six==1.2.0 mock==0.7.2 $DJANGO --use-mirrors

That line if [[ "$DJANGO" < "Django==1.5" ]] in the ci/install.sh script does a lexicographic comparison against the $DJANGO environment variable, so even when I'm using the Django trunk from git, it evaluates correctly as being greater than "Django==1.5" (upper-case "D" is less than lower-case "g"). Now, Travis will immediately exit (with an implicit exit code of 0, indicating everything is alright) from installing my dependencies before installing six and Django, which is important for the next step.

Next, in the script that runs my tests, I want to check whether Django has been installed:

runtests.py:

try:
    import six
    from django.conf import settings
except ImportError:
    print("Django has not been installed.")
    sys.exit(0)

Here, it tries to import those things that did not finish installing above. If it cannot import them, then it will immediately exit. The exit code 0 tells Travis that all the tests for this environment have passed, which, since there are no tests for this [invalid] environment, they have.

Now my project is ready for testing with Travis under Python 2 and 3!

Update (11 March 2013):

I just learned about the matrix section of a Travis config file! See here for an example.

Tuesday, January 29, 2013

Decoding JSON from a relational DB in Python

Lately I've been creating a fair number of APIs that store some JSON in a field in a database. Often this is a good reason to use a document database like Mongo or Couch, but sometimes a relational DB will do you better (e.g., if it's just a small amount of data for each row that is free-form, but the relational aspects are many and important). If you are using a relational DB with a JSON field, you will at some point have to deserialize that JSON for use as data.

I'm concerned with the speed of my APIs, so I began wondering what the best way to do this was. When returning a single record, there aren't many choices. I load the record, pull out the JSON field value as a string, deserialize it, and put it back on as a dictionary. However, when returning a potentially large set of records, my first inclination was to process each record individually, and compose them all into a list. Something like the following:

    def get_record(...):
        record = execute_query(...)
        data = process_record(record)
        return data

    def get_many_records(...):
        records = execute_query(...)
        data = [process_record(record) for record in records]
        return data

    def process_record(record):
        json_str = record['blob']
        record['blob'] = json.loads(json_str)
        return record


The issue here is that, if I have a number of records, json.loads will get called once for each one. The other option that I had was to compose all of the JSON data into a single list, deserialize it all at once, and then partition it back out to its original objects -- something like:

    def get_many_records(...):
        records = execute_query(...)
        json_str = '[%s]' % ', '.join(record['blob'] for record in records)
        blobs = json.loads(json_str)
        for record, blob in zip(records, blobs):  # izip for Python 2.x
            record['blob'] = blob
        return records


My first thoughts: I prefer the first code block, because it allows me to share code between my single- and multi-record getters, and it seems clearer. Also, I don't really know yet whether the latter would gain me anything significant. To test, I wrote up the following:

    from timeit import Timer

    list_str_code = """
        import json
        list_str = '[' + ', '.join(['{"a": 1}']*1000) + ']'
        data = json.loads(list_str)
    """

    str_list_code = """
        import json
        str_list = ['{"a": 1}']*1000
        data = [json.loads(dict_str) for dict_str in str_list]
    """

    list_str_timer = Timer(list_str_code)
    str_list_timer = Timer(str_list_code)

    print list_str_timer.repeat(number=1000)
    print str_list_timer.repeat(number=1000)


On my machine with Python 2.7, the list_str_code ran consistently more than 3 times faster than str_list_code (1.2 vs 3.9 seconds). With Python 3.2 it was nearly 5 times faster (0.7 vs 3.3 seconds). That's pretty significant.

It is worth noting that I tried this with lists of different sizes as well. Even if I construct list_str and str_list each with only 10 elements and run the code 100,000 times, the list_str_code is still several times faster.

Update:

At the suggestion of @mricordeau, I tried rerunning the timed code using Ultra JSON (ujson) instead of the core json module. I did this by just installing (pip install ujson) and replacing the lines that say import json with import ujson as json. It was, indeed, blazing fast. Notably, it brought the execution times much closer together (for the run above where I got results of 1.2 and 3.9 seconds, ujson gave me times of 0.21 and 0.27 seconds respectively)! This implies to me that much of the time in the core json module is spent in start-up (or tear-down) code each time you call loads.