Python packaging: use the “src”
This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
A lurking problem
Imagine you write a Python library named, say, foo
. And you diligently set up the configuration to package it for distribution (which is not that hard; you can learn the basics of packaging up some Python code in an hour or two, and it’s only when you get into things like bundling compiled extensions in C or other languages that things get annoying). And you also diligently write some tests and periodically run them, and they pass, both on your local machine and in whatever CI you’ve set up.
Except one day someone tells you your package doesn’t actually work. Even though the tests are passing. Huh?
A surprising amount of the time, this can be traced back to a simple decision: to put the directory foo
, containing your library’s main importable Python module, at the top level of the repository (so that if you were looking at it you’d see the pyproject.toml
and other config files, and the foo/
directory all alongside each other).
The issue with this is that the current directory is, by default, on the Python import path. Which means import foo
was always going to work even if your packaging config was broken. That’s why the tests passed: they were still able to successfully import everything they needed. But the packaging wasn’t replicating that structure to someone else’s machine, so they couldn’t use it.
The solutions
There are two things you can and should do to avoid this problem. One is to begin running your tests, at least in CI, with the -I
flag to the Python interpreter (i.e., instead of running pytest
, run python -Im pytest
). This puts the interpreter into “isolated mode” where, among other things, the current directory is not on the import path. This is a good idea for almost any command you’ll run in CI, because isolated mode can cut off a lot of easy attack vectors.
The other thing is… stop putting your module at the top level. Instead, adopt the src/
layout or a variation of it, where your module is inside a directory called src/
. This way, even locally and even when you forget to use -I
, running tests will require that the package successfully build and install, because the module will no longer be top-level and thus no longer implicitly on the import path.
This is not the only reason to prefer a src/
layout (here’s a post listing a few more, and fortunately things have gotten easier since it was written — setuptools
now automatically recognizes and explicitly supports src/
layouts, for example), but it is a compelling one, and it’s the one that got me to change over (after not believing the broken-package problem would ever happen to me — at least I was in good company, and now I’m happy to tell Hynek he was right and I should’ve listened when he wrote that post).