Let’s talk about testing Django apps

April 3, 2017 Django, Python

For quite a while now, I’ve maintained several open-source Django applications. And with Django 1.11 on the horizon, it’s time for me to go through and make any changes and clean out their issue queues while I’m at it, so I can roll new officially-compatible releases. When I’m doing this, I always like to try to evaluate the current state of the art and see if I can switch over to better ways of doing things, since Django gets a new features with every release and people in the community keep coming up with better ways to approach common tasks or problems.

Lately, one that’s been on my mind a bit is testing. I’m a fan of testing, and on my personal apps I have CI set up to run on every pull request and every commit I push, against the full combinatorial matrix of Django/Python versions I support. I also use coverage.py, and have it set to break my builds if I ever go below 100% test coverage.

So far, so good. And, of course, Django itself supplies a bunch of built-in tools to make testing easier and nicer.

But I still feel like good patterns for testing are an area where Django could do a lot better. So let’s dig into some of the pain points.

Testing apps

If you’re building a site with Django, you’ve got a bunch of individual apps which (hopefully) all have their own tests, and are listed in INSTALLED_APPS, and then testing is easy: just run manage.py test. You can also pass arguments to that on the command line to run just one application’s tests, or to run a subset of tests.

But what about testing at the application level? In order to do just about anything with a Django application, you need to have settings available, and applications aren’t the level where you specify settings. Again, if you’re just building an app to integrate into a single site or service you’ll deploy, this isn’t a big hurdle since you’ll always have the larger setup available to test with, but what about applications that are meant to be distributed on their own and re-used in lots of places?

For that, you need to provide a minimum configuration for Django to be able to run, and then execute your tests. I’ve taken to adding a file called runtests.py to my applications, containing the configuration needed for testing the app and the appropriate invocation to run the tests. Here’s an example from django-registration; from a checkout of the code, or an unzipped copy of the release package, you can run python runtests.py and it will work.

The trick there is in two functions built in to Django: django.conf.settings.configure(), which lets you supply settings as keyword arguments, and thus use Django without a settings file or DJANGO_SETTINGS_MODULE variable; and django.setup(), which allows you to (after configuring settings) initialize the set of installed applications and gets Django ready to use. Once you’ve done those two things, you can instantiate a test runner, and use it to run the tests; that’s what the function run_tests() does in the file linked above.

That gets as far as being able to run the tests on demand, but of course there’s (at least) one more question left to answer: how should people invoke this? The easy answer is python runtests.py, of course, or coverage run runtests.py for running with coverage support. But it feels a little bit ad-hoc.

Testing with setuptools

The Python standard library includes the distutils module for building and working with packaged Python code. And there’s also setuptools, which started life as part of an effort to do a lot more (and a lot more ambitious) things. Nowadays, using setuptools for some of the packaging-related conveniences it provides is pretty common, and one of those conveniences is the ability to specify the test_suite argument to the setup() function in setup.py. If you do this, then you gain access to python setup.py test as a command to run your tests. For example, I could pass test_suite=registration.runtests.run_tests in the setup.py of django-registration, and then python setup.py test (or coverage run setup.py test) would be the test command to advertise to people, and to specify in the CI configuration.

This feels a lot better than just telling people to run a random script inside the repository/package: it uses a standard-ish Python module (if you have pip nowadays, you also have setuptools), it hooks into a standard package-related Python command, and it’s a thing lots of packages can do, so that python setup.py test can just be a thing people learn to do once and then run over and over.

But there are some bits still missing here. For one, you still need to provide a ready-made environment capable of running the tests. For a Django application, that means at least providing a supported version of Django and a supported version of Python. You can do a lot with setuptools, of course: you can specify install_requires and python_requires to say what versions of Django and Python are supported. Then setuptools will install a version of Django for you alongside the package, and will bail out if you’re using an unsupported version of Python. You can even take it a step further and specify tests_require to ensure test-only dependencies (in my case, coverage and flake8) are available.

However, this only gets to the point of running tests against a single known-supported version of Python and a single known-supported version of Django. What if — as many people do — you want to test against the full set of combinations of supported Python/Django versions?

Aside: `tox`

I should pause here to mention that I’m not going to go over using tox. This isn’t because tox is bad or wrong — I know a lot of folks who are very happy using it — but because tox doesn’t work for me personally. On my primary laptop, I use pyenv and pyenv-virtualenv to manage many versions of Python, switch between them, and create and use virtualenvs with whatever version of Python I want.

And tox does not seem to play particularly well with this; it expects to be able to find certain specifically-named Python interpreters for different versions of Python, and I’ve never been able to make that work without hacking my PATH to manually inject pyenv-installed interpreters into a location where tox will find them (and I am aware of, and have tried, tox-pyenv, but still couldn’t get tox to work without PATH fiddling).

If your local setup is one that tox works well with, or you’re OK with the PATH fiddling to get tox and pyenv working together, I encourage you to try tox for your testing. What I’ll detail below is mostly a reinvention of the parts of tox that I’d want for local testing, but in a way that automatically works well with pyenv.

Go ahead, make my tests

Recently I’ve been experimenting with something a bit older. While python setup.py test is probably a good Python-specific standard for a test-running command, it is still Python-specific. There’s a much older, much more widespread language-agnostic way to do this: make test.

In the Unix world, make is the old-school way to automate the process of building a piece of software. It’s oriented around the idea of being able to specify tasks which need to happen, and dependencies between them, and then invoking whichever tasks you need to run. In its original use case, this would be compiling C source files and then linking them to build a final executable, but nothing about it requires you to be working with C — it’s a pretty generic tool.

Tasks — called “targets” in make lingo — are specified in a file, usually named Makefile. Each target then becomes a name you can pass to the make command, and can specify any other targets as dependencies, ensuring they’ll happen first.

If you’ve ever manually built documentation for something using Sphinx (and Sphinx is good stuff — you should use it for your documentation!), you’ve used this, because Sphinx generates a Makefile to coordinate different tasks in the build process. If you want HTML documentation, for example, you run make html, and that invokes a target in the Makefile which runs all the necessary steps to generate HTML from your source documentation files.

And in many domains, make test is the expected standard way to run tests for a codebase. All you have to do for that is provide a test target in the Makefile, and have it run the correct set of commands.

So I started playing around with building a Makefile to do what I wanted. There are a few things here worth knowing:

Inside a Makefile, you can set, test for and read variables. You can also pass variables in by specifying them on the command line, or setting them in the environment.
Inside a make target, each command is one logical line. This means if you want to spread a command out over multiple lines, you need to use a backslash to continue the logical line over multiple physical lines of the file.
Each logical line is a command which will be executed, so it’s written in a Bash-script-like style, and you can use Bash tests (like checking whether a file/directory exists) and logical operators to control what happens.
Normally, each target inside a Makefile is expected to describe how to compile/build a file whose name is identical to the name of the target. You can create targets which don’t correspond to a filename by using the .PHONY declaration.

So here’s an example from django-registration. It allows configuration of the Python/Django versions to use, which means it can be used in a loop to run against combinations of versions. The important targets here are:

venv uses pyenv to create and activate a virtualenv for the target Python version, and defaults to naming that virtualenv registration_test. If a virtualenv of that name already exists, it skips creation and uses the existing virtualenv regardless of what Python version that virtualenv has.
lint runs flake8 over the codebase, to check for any Python style errors.
test runs the test suite with coverage, and prints the coverage report.
teardown cleans up afterward and will destroy the virtualenv.

There are a bunch of other targets in there, which do things like install the test dependencies, install the requested version of Django, etc. So now I can just specify make test in my CI configuration as the test command, and know that dependencies will be installed (previously I’d have had to use test_requires or manually specify installation of test dependencies), and for local testing I can test against whatever combination of Django/Python versions I want. For example, to run with Django 1.9 and Python 3.5.2:

$ make venv test teardown PYTHON_VERSION=3.5.2 DJANGO_VERSION=1.9

Of course, this is still just an experiment, and there are things I want to fix about it. Right now the biggest annoyances are:

Specifying the Django version to install. Unfortunately, Python’s method of interpreting version specifiers isn’t really what I want here. There’s a bit of a hack in the Makefile linked above to ensure that I can say “1.8” or “1.9” and actually get the latest release of those series (the short version here is that Django~=1.9 will install Django 1.10, but Django~=1.9.0 installs the latest 1.9).
Cleaning up. I’ve been copy/pasting this Makefile into most of my repositories, and the commands to remove __pycache__ directories and .pyc files need some adjustment to work with arbitrarily-deeply-nested directories of code (django-registration, for example, goes several levels deep).
There is some reusability concern here since the Makefile is set up to use pyenv, and I know not everyone uses pyenv. For now, I’m relying on the fact that setting up a virtualenv is optional, and people can supply their own environment if they don’t use pyenv.

The future

I’ll probably continue tinkering with the Makefile approach for a while, and if I can iron out the issues listed above, I might stick to it going forward, at least for my personal apps. If not, perhaps I’ll go back to setup.py test, or explore other options (like just buckling down and making tox work).

In the meantime, I’d be interested to know about other approaches; I haven’t attempted any kind of scientific or even pseudoscientific survey of popular Django applications to see whether there’s any consensus on testing applications standalone.