Initial data and install-time code

November 21, 2007 Django

A fairly frequently-asked question is something along the lines of “how do I provide some data which gets installed along with my application?”, or some variation on that, often including questions about how to ensure a particular bit of code is run when the application is installed via syncdb. Django provides several different ways of approaching this, depending on the exact situation and exactly what you need to do, and while they’re mostly documented it still seems to cause a lot of questions. So today let’s run down the different options and see where each one is appropriate.

Providing raw SQL to insert data

This is the oldest (it’s been part of Django since the beginning) and probably the simplest (both in terms of what you need to do, and what it supports) method of providing some initial data to be installed alongside your application: your application can simply provide a SQL file, containing appropriate INSERT statements, and Django will execute that SQL after it’s created the database tables for the application.

To do this, add a directory called “sql” to your application, and for each model which needs to provide initial data, add a file “modelname.sql”, where “modelname” is the name of your model (in lower-case; in other words, something you’d pass to get_model() or which could end up in the “model_name” attribute of the model’s “_meta”). So, for example, I have a blog application which contains an Entry model; I could populate some entries automatically by adding a directory named “sql”, containing a file named “entry.sql”, to the application.

In addition to simple INSERT statements you can put other SQL in here, but be warned that the order in which multiple initial SQL files for an application will be processed is not reliable (so don’t rely on one file being executed before another), and that not all features of SQL syntax are supported here; in order to deal with the limitations of the different databases it supports, Django does extremely simple tokenization to break up the file into individual SQL statements (delimited with semicolons) rather than simply piping the full file into a database client.

You can see whether an application has supplied some custom install-time SQL by using the “sqlcustom” manage.py command.

Using fixtures

Fixtures are the newest method for providing data to be automatically loaded, and are used heavily by Django’s testing framework to provide data for unit tests to work with. Rather than providing a SQL file, with a fixture you provide a file in a format supported by Django’s serialization system, and that file will be read and translated into model objects, which will then be saved into your database.

For automatic installation of initial data, create a fixture file (the easiest way is to use the “dumpdata” manage.py command) and make sure the file name is “initial_data”; in other words, the file can be named any of

initial_data.json
initial_data.xml
initial_data.yaml
initial_data.python

Update: see Russell’s comment for some notes on choosing a serialization format.

Place this file into a directory called “fixtures” inside your application, and it will be automatically detected during syncdb and the data will be installed once your application’s database tabls are created.

You can also manually load a fixture using the “loaddata” manage.py command, if you have more fixtures you want to use or if you didn’t supply the fixture prior to running syncdb.

Using the `post_syncdb` signal

The third, and most generally flexible, method of taking some action when your application is being installed is to use the post_syncdb signal sent by Django’s internal dispatcher. If you’re not familiar with it, the dispatcher is a method by which various parts of Django — and your own applications — can notify each other of particular events, by sending out “signals”. One signal built in to Django is post_syncdb, which is sent after each application’s tables are created by manage.py syncdb.

To take advantage of this, create a file in your application called “management.py”, and add the following:

from django.dispatch import dispatcher
from django.db.models.signals import post_syncdb
from myapp import models as myapp

Replace the last import statement with one which imports your own application’s models; for example, if you were writing a blog application it might look like this:

from blog import models as blog_app

Then define a Python function which does whatever install-time work you’d like; once it’s registered with the dispatcher (we’ll see how to do that in just a moment), it will be called immediately after syncdb creates your application’s database tables, so it’s free to do anything it likes that relies on those tables existing, including changing them, adding additional features, or inserting data using Django’s model API.

Finally, register your function with the dispatcher, and set it to listen for the post_syncdb signal from your application, by calling dispatcher.connect(); there are three arguments you’ll want to supply:

Your function. This should be the actual function, not just a string containing its name.
The keyword argument signal, which should be the post_syncdb signal you imported.
The keyword argument sender, which should be your application’s models module (imported as explained above).

So to continue with the blog example, you might define a setup_blog() function, and register it like so:

dispatcher.connect(setup_blog, signal=post_syncdb, sender=blog_app)

During syncdb, Django will look for the management.py file in your application and import it if it exists, which will cause the dispatcher.connect() line to execute and register your function; then when the tables for your application have been created, the post_syncdb signal will be sent for your application, and the dispatcher will make sure your custom function gets called.

Django’s authentication application uses this to prompt you to create a superuser during syncdb; its management.py file defines the function which prompts and creates the superuser, then connects that function to the post_syncdb signal and listens for its own installation. The sites application also uses this to create a default Site object when it’s installed.

If you’d like to have your application do something whenever any application is installed, not just your app, you can omit the sender argument, and it will be called every time syncdb finishes installing an application. If you do this, you’ll want your function to also accept a couple of optional arguments (the dispatcher will ensure they’re passed properly):

app, which will be the models module of the application which was just installed (suitable for passing to get_models(), as we’ve already seen).
created_models, which will be a list of the models whose tables were just created.

A couple of Django’s bundled applications use this trick:

The auth application uses it to create the default Permission objects for every model as it’s installed.
The contenttypes framework uses it to create ContentType objects for models as they’re installed.

Use as appropriate

Generally, each of these techniques works best in a different type of situation, so be sure to choose the one that’s appropriate to what you’re specifically trying to accomplish:

Supplying an initial SQL file works well when you need to supply some initial data and optionally some extra SQL to be executed (e.g., to set up additional constraints or triggers), and you know that the app will be used with a database which supports it. If you have an extremely large set of data to insert, this is also a good choice because it doesn’t incur the overhead of creating model objects.
Using fixtures is the most robust method of supplying just a set of initial data and, because it uses Django’s serialization system, isn’t subject to database-specific quirks. It is a bit slower than directly inserting data with SQL, since the serialization system has to create the actual model objects, but the flexibility of the serialization system often outweighs this drawback. It also makes for an easy way to migrate between databases (when I switched this site from MySQL to PostgreSQL, I used fixtures to transfer the data).
Using the post_syncdb hook gives you the freedom to run pretty much any legal Python code, but is a little onerous for creating lots of objects. Generally it’s best when you need to create a single object (as in the case of the initial superuser account, or the default Site object), or when your application needs to be notified any time new applications or models are being installed elsewhere.

Initial data and install-time code

Providing raw SQL to insert data

Using fixtures

Using the post_syncdb signal

Use as appropriate

Using the `post_syncdb` signal