Database functions in Django
This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Functionally a database
On top of the basic query syntax we’re all used to, SQL databases tend to have a large number of built-in functions — some of which are standardized, some of which are unique to a particular database — and often the ability to write your own custom ones as well.
And Django’s ORM supports a lot of standard functions, making it easy to use them in ORM query expressions. But you can also write your own.
Trigram matching
So as a simple example, let’s consider the trigram-matching extension that ships with PostgreSQL. You can read its official documentation for an idea of how it works, but the basic concept is that it breaks text values down into trigram patterns which are surprisingly effective at fuzzy text matching, and provides both functions and operators for query trigrams and similarity of trigrams.
Django supports the operators from the trigram extension, and already has implementations of database functions for the trigram functions, too, but the point here isn’t to fix something missing in Django — it’s to show how you can do this, and I think the basic trigram function is a nice example to use. So let’s reimplement it.
We’re going to be writing a subclass of Django’s base Func
SQL expression here, and the absolute simplest possible version would just set the function name and the expected argument count. The base trigram function name is SIMILARITY
, and it takes two arguments, so:
from django.db import models
class Similarity(models.Func):
"""
Django ORM function expression which calls the PostgreSQL
trigram SIMILARITY() function.
"""
arity = 2
function = "SIMILARITY"
Now suppose we want to query a blog entry model for titles that are similar to a search string. We could do it like so:
from django.db import models
from django.db.models.functions import Lower
search_result = Entry.objects.annotate(
match=Similarity(
Lower("title"),
Lower(models.Value(search_term)),
)
).filter(match__gt=0.5)
This uses annotate()
to add a calculated column containing the similarity between the entry title and the search term to each row of the result, and then filters for rows which have a similarity score above a certain threshold.
Improving it
But we can do better than this. There are two specific things in particular that should be done here:
- We should specify the output type of this expression so Django knows what model field type to use for it.
- We should wrap any literal arguments we receive in instances of Django’s
Value()
expression so they can be correctly resolved as expressions when run. This way users of the function don’t have to remember to do this manually as above.
Here’s the new version:
class Similarity(models.Func):
"""
Django ORM function expression which calls the PostgreSQL
trigram SIMILARITY() function.
"""
arity = 2
function = "SIMILARITY"
output_field = models.FloatField()
def __init__(self, first_expression, second_expression, **extra):
if not hasattr(first_expression, "resolve_expression"):
first_expression = models.Value(first_expression)
if not hasattr(second_expression, "resolve_expression"):
second_expression = models.Value(second_expression)
super().__init__(first_expression, second_expression, **extra)
The __init__()
method checks whether the two arguments are already some type of Django SQL expression by looking for a standard attribute of all expression types and, if not found, wraps them in Value
. And the output_field
attribute tells the ORM what type of model field the return value corresponds to.
The usage is the same as above, except now without the need to remember to wrap anything in Value
:
from django.db import models
from django.db.models.functions import Lower
search_result = Entry.objects.annotate(
match=Similarity(
Lower("title"),
Lower(search_term),
)
).filter(match__gt=0.5)