Sunday, October 27, 2013

I think 12 Years A Slave isn't about slavery

It's about unjust socioeconomic systems and our multifaceted complicity in them. The vehicle is a man's journey through the hell of chattel slavery, but the man as well as the institution are just a vehicle. The stars are all the different ways that unjust institutions maintain the power we give them. The film does both the journey and the theme justice with a minimum of false redemption.

The are many powerful images and actions that are specific to slavery, but characters responses to those events could be taken right out of the film and be at home in any American period, including our own. I hope that people see that and keep it in mind while watching.

Also, it's beautifully shot, scored and directed. A technical masterpiece at the least.

Wednesday, March 6, 2013

Testing Django against Multiple Python Versions with Travis

Django 1.5 has experimental support for Python 3. This is great news, and means that we should all be upgrading our Django apps to be able to support both Python 2 and 3. Django has some documentation on how to go about this process. I recently went through this for one of my apps and, while the changes to the app itself were easy enough, I had to figure some things out myself for my testing setup.

I like to use Travis for continuous integration.

  • It's easy to set up (I usually just copy a .travis.yaml file from another project and tweak a couple settings)
  • It integrates really well with GitHub, even automatically running tests against people's pull requests

The .travis.yml for my projects usually looks something like this:

language: python
python:
  - "2.6"
  - "2.7"

env:
  - DJANGO="Django==1.3"
  - DJANGO="Django==1.4"
  - DJANGO="git+git://github.com/django/django.git@master#egg=django"

# command to install dependencies
install: pip install -r requirements.txt

# command to run tests
script: python setup.py test

Then I'll list my requirements in requirements.txt, and put a test command in setup.py. Travis, when given this configuration, will run tests in six different environments:

  • Python 2.6 with Django 1.3
  • Python 2.6 with Django 1.4
  • Python 2.6 with Django's trunk
  • Python 2.7 with Django 1.3
  • Python 2.7 with Django 1.4
  • Python 2.7 with Django's trunk

Up until now, this has served me well.

With the introduction of Python 3 support, I introduce an additional Python environment for my tests:

python:
  - "2.6"
  - "2.7"
  - "3.2"

Now Travis will run tests in 9 environments. However, in some of those environments, the tests will fail, even after I've done the work to make my app Python 3 compatible. So what happened?

The environments that Travis sets up are:

  • Python 2.6 with Django 1.3
  • Python 2.6 with Django 1.4
  • Python 2.6 with Django's trunk
  • Python 2.7 with Django 1.3
  • Python 2.7 with Django 1.4
  • Python 2.7 with Django's trunk
  • Python 3.2 with Django 1.3
  • Python 3.2 with Django 1.4
  • Python 3.2 with Django's trunk

But notice that, even though my app may be compatible with Python 3, Django 1.3 and 1.4 are not. So, these environments will produce errors when even trying to install Django.

My solution to this is to introduce a couple of checks in my install and test running scripts. First, I have to make my Travis configuration a little smarter. I use a separate install script that checks whether the Django version is compatible with the Python version, and immediately exits if not:

.travis.yml:

language: python
python:
  - "2.6"
  - "2.7"
  - "3.2"

env:
  - DJANGO="Django==1.3"
  - DJANGO="Django==1.4"
  - DJANGO="Django==1.5"
  - DJANGO="git+git://github.com/django/django.git@master#egg=django"

# command to install dependencies
install: ci/install.sh

# command to run tests
script: python setup.py test

ci/install.sh:

#!/bin/bash

PYTHON_VERSION=$(python --version 2>&1)

if [[ "$PYTHON_VERSION" > "Python 3" ]]
then
  if [[ "$DJANGO" < "Django==1.5" ]]
  then
    echo "Cannot install $DJANGO on $PYTHON_VERSION"
    exit
  fi
fi

pip install six==1.2.0 mock==0.7.2 $DJANGO --use-mirrors

That line if [[ "$DJANGO" < "Django==1.5" ]] in the ci/install.sh script does a lexicographic comparison against the $DJANGO environment variable, so even when I'm using the Django trunk from git, it evaluates correctly as being greater than "Django==1.5" (upper-case "D" is less than lower-case "g"). Now, Travis will immediately exit (with an implicit exit code of 0, indicating everything is alright) from installing my dependencies before installing six and Django, which is important for the next step.

Next, in the script that runs my tests, I want to check whether Django has been installed:

runtests.py:

try:
    import six
    from django.conf import settings
except ImportError:
    print("Django has not been installed.")
    sys.exit(0)

Here, it tries to import those things that did not finish installing above. If it cannot import them, then it will immediately exit. The exit code 0 tells Travis that all the tests for this environment have passed, which, since there are no tests for this [invalid] environment, they have.

Now my project is ready for testing with Travis under Python 2 and 3!

Update (11 March 2013):

I just learned about the matrix section of a Travis config file! See here for an example.

Tuesday, January 29, 2013

Decoding JSON from a relational DB in Python

Lately I've been creating a fair number of APIs that store some JSON in a field in a database. Often this is a good reason to use a document database like Mongo or Couch, but sometimes a relational DB will do you better (e.g., if it's just a small amount of data for each row that is free-form, but the relational aspects are many and important). If you are using a relational DB with a JSON field, you will at some point have to deserialize that JSON for use as data.

I'm concerned with the speed of my APIs, so I began wondering what the best way to do this was. When returning a single record, there aren't many choices. I load the record, pull out the JSON field value as a string, deserialize it, and put it back on as a dictionary. However, when returning a potentially large set of records, my first inclination was to process each record individually, and compose them all into a list. Something like the following:

    def get_record(...):
        record = execute_query(...)
        data = process_record(record)
        return data

    def get_many_records(...):
        records = execute_query(...)
        data = [process_record(record) for record in records]
        return data

    def process_record(record):
        json_str = record['blob']
        record['blob'] = json.loads(json_str)
        return record


The issue here is that, if I have a number of records, json.loads will get called once for each one. The other option that I had was to compose all of the JSON data into a single list, deserialize it all at once, and then partition it back out to its original objects -- something like:

    def get_many_records(...):
        records = execute_query(...)
        json_str = '[%s]' % ', '.join(record['blob'] for record in records)
        blobs = json.loads(json_str)
        for record, blob in zip(records, blobs):  # izip for Python 2.x
            record['blob'] = blob
        return records


My first thoughts: I prefer the first code block, because it allows me to share code between my single- and multi-record getters, and it seems clearer. Also, I don't really know yet whether the latter would gain me anything significant. To test, I wrote up the following:

    from timeit import Timer

    list_str_code = """
        import json
        list_str = '[' + ', '.join(['{"a": 1}']*1000) + ']'
        data = json.loads(list_str)
    """

    str_list_code = """
        import json
        str_list = ['{"a": 1}']*1000
        data = [json.loads(dict_str) for dict_str in str_list]
    """

    list_str_timer = Timer(list_str_code)
    str_list_timer = Timer(str_list_code)

    print list_str_timer.repeat(number=1000)
    print str_list_timer.repeat(number=1000)


On my machine with Python 2.7, the list_str_code ran consistently more than 3 times faster than str_list_code (1.2 vs 3.9 seconds). With Python 3.2 it was nearly 5 times faster (0.7 vs 3.3 seconds). That's pretty significant.

It is worth noting that I tried this with lists of different sizes as well. Even if I construct list_str and str_list each with only 10 elements and run the code 100,000 times, the list_str_code is still several times faster.

Update:

At the suggestion of @mricordeau, I tried rerunning the timed code using Ultra JSON (ujson) instead of the core json module. I did this by just installing (pip install ujson) and replacing the lines that say import json with import ujson as json. It was, indeed, blazing fast. Notably, it brought the execution times much closer together (for the run above where I got results of 1.2 and 3.9 seconds, ujson gave me times of 0.21 and 0.27 seconds respectively)! This implies to me that much of the time in the core json module is spent in start-up (or tear-down) code each time you call loads.

Friday, June 15, 2012

Invitation: Gnome/Gtk+/GObject Study and Hack Group

I don't know whether desktop programming is dead, but I do know that there are apps that I want that the web can't give me yet.  These are mostly software development-related, but that's what I use my computer for most.

Since I use Gnome on Linux pretty exclusively these days (and I'd like to continue), I'd like to start a study/hack group for Gnome/Gtk+/GObject programming.

I've been watching the GObject platform change quite a bit over the last few years.  Most significant, in my opinion, has been the introduction of GObject introspection (GI), which essentially made it possible to use any C+GObject library in any language, as long as there's a GI binding for that language (and there are for Java, JavaScript, .Net (Mono), Python, Ruby, and more).

My goals are (1) to be comfortable enough with GI programming to quickly create applications that I want, without the GI layer being the roadblock, (2) to be comfortable enough with C+GObject or Vala to translate potentially reusable pieces of GI applications into actually reusable components.  And (3ish) if I can contribute to the examples, tutorials, and docs all the while, that's a win too.

So, if anyone is interested is interested in learning or practicing Gnome/Gtk+/GObject development in a group, hit me up @mjumbewu, or in the comments.  Standing offer, so even if this post is several months old, hit me up.

Thanks!
- Mjumbe

Friday, February 10, 2012

Playing around with mapping & reducing


So, this is a purely technical one, and is mostly for me.  Apologies, in advance.
I've been doing a few experiments with mapping/reducing in Python using the multiprocessing module.

Some things I've learned

Some of these I suspected, but some were surprises to me.
  • Passing data between the parent and child processes is slow -- try not to pass big data.
  • Splitting across multiple processes actually does speed up computation, as long as you don't have to pass back a lot of info about each thing.
    For example, this could be a good candidate for mapping to processes:
    def square_all(nums):
        return sum([i*i for i in nums])
    
    This, maybe not so much:
    def square_all(nums):
        return [i*i for i in nums]
    
    The first just returns a single number for each subsection of data. The latter returns a potentially large list.
  • Using processes and queues directly is more flexible, but mapping over pools saves a lot of typing. For example, these functions accomplish similar things. The first is a function that just runs the processing on a single entire data set. The next two demonstrate running the data in multiple processes. In multi, it splits the data roughly evenly over n processes and immediately runs those processes in parallel. In pooled, it splits the data into n parts and doles each part out over p processes. When n == p, multi and pooled do basically the same thing.
    Note that, in order to accommodate the Queue pattern in multi, I had
    to modify the function being called:
    def square_all(nums, q=None):
        if q:
            q.put(sum([i*i for i in nums]))
        else:
            return sum([i*i for i in nums])
    
  • You cannot pass around simple generators (i.e., like (i*i for i in nums)), because they cannot be pickled. There are ways around this.
  • Performance of the process pools was more erratic than the manually handled processes and queues. This may be due to memory constraints; for some reason, when my script got down to looping over the pools, the memory usage went way up. I took a screenshot of my system monitor running on half as much data.
  • My computer performed about equally well above 3 processes when they were handled manually. I expected that 8 processes would not have worked out well, but I was wrong.

The script

Here's the source of the script that I used to run the experiment:
#!/usr/bin/env python3
#-*- coding:utf-8 -*-

from itertools import chain
from math import floor
from multiprocessing import Pool, Process, Queue

# ---------------------------------------------------------------------------
# The data to process.  Why a global?  No reason really.

full_count = 1000000
full_range = range(full_count)

# ---------------------------------------------------------------------------
# The processing function.  Something simple.

def square_all(nums, q=None):
    """
    Simple function to loop over and square all the numbers in ``nums``.
    """
    if q:
        q.put(sum([i*i for i in nums]))
    else:
        return sum([i*i for i in nums])

# ---------------------------------------------------------------------------
# The processing wrappers.

def single():
    """
    Run the the function on the full range of values.
    """
    result = square_all(full_range)
    return result

def multi(n):
    """
    Partition the range into n parts and run each in a separate process.
    """
    # Partition
    parts = _partition(full_range, n)

    # Map
    queues = [Queue() for _ in range(n)]
    processes = [Process(target=square_all, args=(part, queue))
                 for part, queue in zip(parts, queues)]

    for process in processes:
        process.start()

    for process in processes:
        process.join()

    partial_results = [queue.get() for queue in queues]

    # Reduce
    result = _combine(partial_results)
    return result

def pooled(n, p):
    """
    Partition the range into n parts and run on a pool of p processes.
    """
    # Partition
    parts = _partition(full_range, n)

    # Map
    pool = Pool(p)
    partial_results = pool.map(square_all, parts)

    # Reduce
    result = _combine(partial_results)
    return result

# ---------------------------------------------------------------------------
# A few utilities for partitioning and combining.

def _partition(l, n):
    """
    Partition the list l into n parts, and return a list of the parts.
    """
    count = len(l)
    return [l[floor(i / n * count):floor((i + 1) / n * count)]
            for i in range(n)]

def _combine(partial_results):
    """
    Combine the list of partial results into a final result.
    """
    result = sum(partial_results)
    return result

def _time(f, reps=10, args=tuple(), kwargs=dict()):
    for _ in range(reps):
        t = time.time()
        f(*args, **kwargs)
        print (' ', time.time() - t)

# ---------------------------------------------------------------------------
# Here's the script to generate the results.

if __name__ == '__main__':
    import time

    print('Time for single process:')
    _time(single)

    for n in range(2,9):
        print('Time for {} process:'.format(n))
        _time(multi, args=(n,))

    for p in (2,3,4):
        for n in (p,2*p):
            print('Time for {} partitions over {} process:'.format(n, p))
            _time(pooled, args=(n, p))

Tuesday, November 1, 2011

Thoughts on my support of #occupy, as a software developer

Tomorrow, November 2nd, there is going to be a general strike in Oakland. Though I support #occupy, I'm not going to be on strike, for a few reasons:

  1. I also support the organization I'm working for, think they're doing good work, and feel that it is not incongruous with #occupy (though it's certainly not the same)
  2. I don't actually work in Oakland
Even though I'm not striking, I feel I can use my time and my skills at my job to help out.  I've been inspired by the work of some of my colleagues in helping to create design artifacts for use by #occupy-ers.  Here's a list of a few of the things I may be doing, and you might consider as well.
  • Help out #occupy tech support.  I've come across requests from several #occupy communities for additional IT hands (I don't know whether there is a central place where these requests are posted). If you feel so inclined, volunteer a little time to maintaining an #occupy web site, or posting minutes, etc. (if anyone knows more about what tech needs #occupy communities have, please speak up).
  • Help create logistical tools. The #occupy communities that I've seen are little micro-societies and, like any society, have a number of logistical considerations and concrete needs. Are there tools that could help? For instance, I liked the idea of needsoftheoccupiers which they described as "a wedding registry meets Freecycle for the Occupy movement".
  • Do something creative/educational/inspirational. #Occupy has people paying [at least superficial] attention to issues of economic [in]equality and social participation.  That doesn't happen too often, so take advantage of it by creating a visualization or an info-app that awakens peoples imagination and helps them see things that they wouldn't normally.  Things like BankMigration map and Slavery Footprint come to mind.
If you'd like to brainstorm or help out, I'm going to start a list of potential projects here, and I'll be in #occupyoakland on irc.indymediaorg tomorrow.  There are also other projects listed at OccupyHack and on github.

Update:

Something else that could perhaps use some help:
  • @benzotweet is "trying to develop a database solution for occupy... crazy requirements! no training, role based security, decentralized... #wiki?"
Also, for some inspiration, a quote from Justin Tunney (who appears to have done a lot of work on the #ows site, which is an impressive bit of code itself):
"This isn't a protest, it's a revolutionary movement empowering people to create their own change. ...We're trying to encourage people to organize and create their own change outside of the existing establishment through direct action."
Update 2:

An interesting idea for a project that came up last night: log tweets tagged with "#occupy..." that have pictures, pull out the geo exif data, and put it on a live-updating map ... kinda like http://www.artmapper.org/ (source at https://github.com/mertonium/muralmapper).

Update 3:

To stay abreast of tech-related opportunities to support #occupy, join the occupyhack googlegroup.

Friday, June 10, 2011

Generic Django Model Templates


Update 23 Jun 2011: I have renamed the django-model-filters package django-model-blocks. It now lets you easily override the template used for a given model. Check out the changes on Github or PyPI.


Tonight I'm writing my first Django custom filter. The problem I'm trying to solve is that I want generic templates. For a given model I want to be able to set up browseable index and detail pages with minimal effort. As it stands now, say I have the following model:


    ...
    class PepulatorModel (models.Model):
        serial_number = IntegerField()
        height = IntegerField()
        width = IntegerField()
        manufacture_date = DateTimeField()
        color = CharField(max_length=32)
    
        def __unicode__(self):
            return u'Pepulator #%s' % self.serial_number
    ...

Now say I want my users to be able to browse my pepulators in a simple way, with the following caveats:

  • They cannot edit pepulators, only view (rules out admin app)
  • I want to define the URL structure (rules out databrowse) to be something like:
    
        http://www.mysite.com/pepulators/
        http://www.mysite.com/pepulators/?color=red
        http://www.mysite.com/pepulators/012345/
    
  • I want to specify the base template so that it integrates well with the rest of my project (also rules out databrowse)

Currently, I can use the generic views ListView and DetailView, but I still have to write templates that go something like this:


    {% extends base.html %}
    
    {% block content %}
        <header>
            <h1>Pepulator #{{ pepulator.serial_number }}</h1>
        </header>
        
        <div>
            <span>Serial Number</span>
            <p>{{ pepulator.serial_number }}</p>
        </div>
        <div>
            <span>Height</span>
            <p>{{ pepulator.height }}</p>
        </div>
        <div>
            <span>Width</span>
            <p>{{ pepulator.width }}</p>
        </div>
        <div>
            <span>Manufacturer</span>
            <p>{{ pepulator.manufacturer }}</p>
        </div>
        <div>
            <span>Color</span>
            <p>{{ pepulator.color }}</p>
        </div>
    {% endblock %}

Okay, a bit verbose, but it's not going to kill me. However, now say I want to change some of the fields on my model. Well, then I have to remember to change the fields in my template as well (error-prone — this is why you don't violate DRY without good reason).

All I wanted was a simple view of my model!

So, I considered making an app that was leaner than databrowser and just provided generic templates to go with generic views. I found myself having to extend the generic views anyway, though, because there's no way to access a model instance's fields and field names without explicitly feeding them to the template's context. Then, I gleaned some inspiration from uni_forms: I'll make filters!

Now my plan is to be able to say, using the example of the Pepulator detail view above:


    {% extends base.html %}
    
    {% block content %}
        {{ pepulator|as_detail_html }}
    {% endblock %}

Sublime. (This must exist somewhere; but for now, I can't find it.)

So, I start off by creating my app


    $ python manage.py startapp generic_templates

Now, from the documentation on creating custom tags and filters, I see I should create a templatetags direcotry in my app. In here I'll put an __init__.py file and a module called generic_filters. This way, when I'm done, to use the filters, I'll put near the top of my template file:


    {% load generic_filters %}

I decided to start with the detail filter (as_detail_html), and to write a test first. I know generally what I want this to do, so I write the following test:


    """
    Test the generic filters
    """

    import datetime

    from django.test import TestCase
    from mock import Mock

    from django.db.models import Model, IntegerField, DateTimeField, CharField
    from django.template import Context, Template
    from generic_templates.templatetags import generic_filters as gf

    class DetailHtmlFilterTest (TestCase):

        def setUp(self):
            # Create a sample model
            class PepulatorModel (Model):
                serial_number = IntegerField(primary_key=True)
                height = IntegerField()
                width = IntegerField()
                manufacture_date = DateTimeField()
                color = CharField(max_length=32)
            
                def __unicode__(self):
                    return u'Pepulator #%s' % self.serial_number
            
            # Create a model instance
            now = datetime.datetime.now()
            self.m = PepulatorModel(
                serial_number = 123456,
                height = 25,
                width = 16,
                manufacture_date = now,
                color = 'chartreuse',
            )
            
            # Mock Django's get_template so that it doesn't load a real file;
            # instead just return a template that allows us to verify the context
            gf.get_template = Mock(
                return_value=Template('{{ instance|safe }}:{{ fields|safe }}'))
        
        
        def test_model_format(self):
            """Tests that a given model is formatted as expected."""
            
            expected_detail = (u"Pepulator #123456:[('serial number', 123456),"
              " ('height', 25), ('width', 16), ('manufacture date', %r),"
              " ('color', 'chartreuse')]") % self.m.manufacture_date
            detail = gf.as_detail_html(self.m)
            
            gf.get_template.assert_called_with('object_detail.html')
            self.assertEqual(detail, expected_detail)

In short, set up a model and an easy template, and check that the template is filled in correctly. Of course, since I haven't yet written my filter, this fails.

This (as_detail_html) was a straightforward method to write, but I did get tripped up because of the poor documentation available on Models' Meta classes. Here's the first go at the filter:


    from django.template import Context, Template
    from django.template.loader import get_template

    def as_detail_html(instance):
        """
        Template filter that returns the given instance as a template-formatted
        block.  Inserts two objects into the context:
          ``instance`` - The model instance
          ``fields`` - A list of (name, value)-pairs representing the instance's
                       fields
        """
        template = get_template('object_detail.html')
        fields = [(field.verbose_name, getattr(instance, field.name)) 
                  for field in instance._meta.fields]
        context = Context({'instance':instance, 'fields':fields})
        return template.render(context)

One other thing: I actually want to be able to use the filter in my templates, not call it directly in my code. I'm new here, so I write another test to make sure I understand what's going on:


        def test_filter_is_registered(self):
            """Test that the filter can be used from within a template"""
            
            template = Template(('{% load generic_filters %}'
                                 '{{ pepulator|as_detail_html }}'))
            context = Context({'pepulator':self.m})
            
            expected_detail = (u"Pepulator #123456:[('serial number', 123456),"
                " ('height', 25), ('width', 16), ('manufacture date', %r),"
                " ('color', 'chartreuse')]") % self.m.manufacture_date
            detail = template.render(context)
            
            gf.get_template.assert_called_with('object_detail.html')
            self.assertEqual(detail, expected_detail)

And it turns out all I have to do to satisfy it is change my module in the following way:


    from django.template import Context, Template, Library
    from django.template.loader import get_template

    register = Library()

    @register.filter
    def as_detail_html(instance):
        ...

Now I have a working object detail template. Yay! I figure I'll do the list the same way.

More on Github: https://github.com/mjumbewu/django-model-filters