Django Annotate: 7 Examples to Supercharge your Querysets

Want to feel powerful? Learn how to use the .annotate() method on your querysets. While your less experienced colleagues write multiple queries and for loops to get the result they are looking for, you can get your queryset to do all the heavy-lifting in a single step.

What do annotations do?

  1. Add additional fields to each result of the queryset, calculated from other fields

  2. Group results, similar to SQLs GROUP BY expression.

Applying .annotate() on an unevaluated queryset will apply the annotation to all results. If you evaluate the queryset using .values() or values_list() first and apply .annotate(), then this will have a grouping effect.

We will cover many examples of both kinds of annotation.

Annotate vs Aggregate

What is the difference between annotating and aggregating a queryset? Annotations will apply the expression to all results in the queryset, whereas aggregates will condense the data into one result.

Formula 1 Examples

I have a Django application that has imported historic Formula 1 data into a relational database (I used PostgreSQL but SQLite is fine). I am going to use .annotate() to generate some interesting statistics about the drivers.

I used the Ergast Developer API to get my data. Rather than call the API, I downloaded their csv tables of historic data and wrote a script to ingest the data into my database.

I ingested the data into six tables: Race, Circuit, Result, Driver, Constructor, Status. The tables are linked

Database schema. The Result tables have foreign keys to Race, Status, Driver and Constructor. The Race table has a foreign key to the Circuit table.

A schema for the database used in the examples.

Annotations are an advanced skill for Django ORM users. If you'd like to build your understanding of Django queries (filter, exclude, distinct, values, values_list etc), then I would recommend my post on Django ORM queries, where you will learn through several F1 examples.

1. Basic Annotate

The Result and Status tables can be combined to count the number of results for each status.

In this example we will count the number of times each status appears in the Race Result table.

What are the most common reasons for drivers to drop out of a race?

I want to know what the most common reasons are for a driver to not finish a race. The Status table has the names of each outcome (e.g. "engine failure", "collision"). Every row in the Result table has the outcome of the race for that driver.

I can use .annotate to find the number of times each status has occurred.

This query will produce the Top 10 most common outcomes:

from stats.models import Status
from django.db.models import Count

top_10_outcomes = (
    Status.objects.annotate(occurences=Count("results"))
    .order_by("-occurences")
    .values_list("status", "occurences")[:10]
)

This is the result:

<QuerySet [('Finished', 6952), ('+1 Lap', 3812), ('Engine', 2005), ('+2 Laps', 1590), ('Accident', 1041), ('Did not qualify', 1025), ('Collision', 829), ('Gearbox', 804), ('Spun off', 791), ('+3 Laps', 731)]>

The .annotate() counted how many times each status appeared in the in Result table.

You can filter your data before annotating it

The results also include statuses that imply the race was finished (e.g. "Finished" and "+3 laps"). We can filter the data first before applying the annotation.

from django.db.models import Count, Q

top_10_dnf_outcomes = (
    Status.objects.exclude(Q(status="Finished") | Q(status__icontains="lap"))
    .annotate(occurences=Count("results"))
    .order_by("-occurences")
    .values_list("status", "occurences")[:10]
)

The annotate still works after applying .filter(), or .exclude(). Using Q expressions, the equivalent to OR in SQL, we can include the "Finished" status or any status that has "lap" in the name.

<QuerySet [('Engine', 2005), ('Accident', 1041), ('Did not qualify', 1025), ('Collision', 829), ('Gearbox', 804), ('Spun off', 791), ('Suspension', 431), ('Did not prequalify', 331), ('Transmission', 321), ('Electrical', 315)]>

Now we can see that engine failure is the most common cause of a DNF (Did Not Finish), followed by accidents, with more than 1000 occurences.

2. Concat

What are the driver's full names?

I want the full names of each driver. Using .annotate, I can concatenate the first and last names of each driver into a new field called full_name.

Driver.objects.annotate(
    full_name=Concat ("first_name", Value(" "), "last_name")
).order_by("full_name")

\don't put a space after "Concat" like I have. This was only to prevent my website from mistaking it for SQL injection.*

This is the result:

<QuerySet [<Driver: Adolf Brudes>, <Driver: Adolfo Cruz>, <Driver: Adrian Sutil>, <Driver: Adrián Campos>, <Driver: Aguri Suzuki>, <Driver: Al Herman>, <Driver: Al Keller>, <Driver: Al Pease>,...>

3. Adding Conditions (Case & When example)

Table with columns "first_name", "last_name", "nationality" and "full_name". The full name column is created by annotating the queryset.

We can use Case and When inside our annotations too add a condition to put the last name before the surname.

You can add conditions to your annotations, which is a way to apply IF statements to your annotations.

For example, Zhou Guanyu is a Chinese driver who prefers to go by his surname before his first name. In the previous example, his full name would be calculated as "Guanyu Zhou". We can use Case and When in our annotation to fix this.

drivers = (
    models.Driver.objects.annotate(
        full_name=Case(
            When(
                nationality="Chinese",
                then=Concat ("last_name", Value(" "), "first_name"),
            ),
            default=Concat ("first_name", Value(" "), "last_name"),
        )
    )
    .order_by("-full_name")
    .values_list("full_name", "nationality")[4:10]
)

This is the result:

<QuerySet [('Zsolt Baumgartner', 'Hungarian'), ('Zhou Guanyu', 'Chinese'), ('Yves Cabantous', 'French'), ('Yuki Tsunoda', 'Japanese'), ('Yuji Ide', 'Japanese'), ('Yannick Dalmas', 'French')]>

4. How to Group Querysets with Django ORM

How many times did each driver in the 2021 season not finish a race?

In the first 3 examples, the annotation was applied to every result in the queryset. But what do you do if you want to use Django ORM to group your data, similar to SQL's GROUP BY?

With .annotate(), it is possible to group data, end up with fewer results than the original queryset. However, you will have to evaluate it using .values() first to get the grouping effect.

In this example, I want to know which driver in the 2021 season had the most DNFs (they didn't finish a race). We can use .annotate() to return every 2021 with their total number of DNFs for that season.

We can tell if a driver didn't finish a race if the finishing_position field of the Result table is null.

This is how we can do it:

driver_dnfs = (
    models.Result.objects.filter(race__year=2021, finishing_position__isnull=True)
    .values(
        "driver",
    )
    .annotate(dnfs=Count("driver"))
    .order_by("-dnfs")
    .values_list("driver", "dnfs")
)

This queryset takes the Result table (one row per driver, per race) and counts the number of rows for each driver where the finishing_position doesn't contain a number (if they have a finishing position, then they completed the race).

These are the results:

<QuerySet [(853, 6), (847, 5), (822, 4), (852, 4), (842, 3), (840, 3), (839, 3), (854, 3), (849, 3), (830, 3), (20, 3), (4, 2), (844, 2), (815, 2), (8, 2), (841, 1), (1, 1), (846, 1), (817, 1)]>

In each tuple, the first element is the ID of the driver. The second element is the number of DNFs that driver had.

We can include more driver information in our result:

driver_dnfs = (
    models.Result.objects.filter(race__year=2021, finishing_position__isnull=True)
    .values(
        "driver",
    )
    .annotate(dnfs=Count("driver"))
    .order_by("-dnfs")
    .values_list("driver__first_name", "driver__last_name", "dnfs")
)

This is the result:

<QuerySet [('Nikita', 'Mazepin', 6), ('George', 'Russell', 5), ('Yuki', 'Tsunoda', 4), ('Valtteri', 'Bottas', 4), ('Lance', 'Stroll', 3), ('Max', 'Verstappen', 3), ('Esteban', 'Ocon', 3), ('Mick', 'Schumacher', 3), ('Pierre', 'Gasly', 3), ('Sebastian', 'Vettel', 3), ('Nicholas', 'Latifi', 3), ('Kimi', 'Räikkönen', 2), ('Charles', 'Leclerc', 2), ('Sergio', 'Pérez', 2), ('Fernando', 'Alonso', 2), ('Lewis', 'Hamilton', 1), ('Lando', 'Norris', 1), ('Antonio', 'Giovinazzi', 1), ('Daniel', 'Ricciardo', 1)]>

Now we can see that Nikita Mazepin had the most DNFs in 2021.

5. Sum Example

How many points did each driver score in the 2021 driver championship?

The result, driver and race tables can be used to create the driver leaderboard for the 2021 season.

By annotating a queryset of the results table, we can generate the leaderboard for the 2021 F1 championship.

For any readers not familiar with Formula 1, the 2021 driver championship was equally exciting and controversial. Lewis Hamilton and Max Verstappen were neck-and-neck and there was one race left to decide who was going to win the championship. For most of the race, Lewis Hamilton was expected to win, but a last-minute safety car narrowed the gap, giving Verstappen a chance to overtake and take the championship.

This query is going to generate the leaderboard for the 2021 driver championship.

We are going to do this by taking all the race results, where the race year was 2021 and take the sum of the points scored for every driver. Note, that there is a result in the Result table per driver and per race.

Similar to the previous query, we will apply .values() on the queryset before applying .annotate to group the results by driver.

# Which driver scored the most points in 2021?

from django.db.models import Sum

driver_championship_2021 = (
    Result.objects.filter(race__year=2021)
    .values("driver")
    .annotate(total_points=Sum("points_scored"))
    .order_by("-total_points")
    .values_list("driver__first_name", "driver__last_name", "total_points")
)

The Sum expression adds all values in the points_scored for all 2021 race results for each driver.

These are the results:

<QuerySet [('Max', 'Verstappen', 389.0), ('Lewis', 'Hamilton', 386.0), ('Valtteri', 'Bottas', 219.0), ('Sergio', 'Pérez', 190.0), ('Carlos', 'Sainz', 164.0), ('Lando', 'Norris', 160.0), ('Charles', 'Leclerc', 159.0), ('Daniel', 'Ricciardo', 114.0), ('Pierre', 'Gasly', 110.0), ('Fernando', 'Alonso', 81.0), ('Esteban', 'Ocon', 74.0), ('Sebastian', 'Vettel', 43.0), ('Lance', 'Stroll', 34.0), ('Yuki', 'Tsunoda', 32.0), ('George', 'Russell', 16.0), ('Kimi', 'Räikkönen', 10.0), ('Nicholas', 'Latifi', 7.0), ('Antonio', 'Giovinazzi', 3.0), ('Nikita', 'Mazepin', 0.0), ('Robert', 'Kubica', 0.0), '...(remaining elements truncated)...']>

We can take this further by adding the annotation to get the driver's full name. However, we need to tweak it because we are querying the Result table rather than the Driver table.

driver_championship_2021 = (
    models.Result.objects.filter(race__year=2021)
    .values("driver")
    .annotate(
        total_points=Sum("points_scored"),
        full_name=Concat ("driver__first_name", Value(" "), "driver__last_name"),
    )
    .order_by("-total_points")
    .values_list("full_name", "total_points")
)

6. Count with conditions

How many point scoring races did each 2021 driver have?

In F1, drivers score points if they finish the race in the top 10. I will use Count and Sum to count the number of races for each driver where they scored points.

I could filter the queryset for results where points_scored is more than 0 before the annotation or I could filter inside the annotation.

# How many point scoring races did each 2021 driver have?

from django.db.models import Count, Q, Sum, Value
from django.db.models.functions import Concat

points_scoring_races = (
    models.Result.objects.filter(race__year=2021)
    .values("driver")
    .annotate(
        full_name=Concat ("driver__first_name", Value(" "), "driver__last_name"),
        points_scoring_races=Count("driver", filter=Q(points_scored__gt=0)),
        total_points=Sum("points_scored"),
    )
    .order_by("-points_scoring_races", "-total_points")
    .values("full_name", "points_scoring_races")
)

This is the result:

<QuerySet [{'full_name': 'Lewis Hamilton', 'points_scoring_races': 20}, {'full_name': 'Carlos Sainz', 'points_scoring_races': 20}, {'full_name': 'Lando Norris', 'points_scoring_races': 20}, {'full_name': 'Max Verstappen', 'points_scoring_races': 19}, {'full_name': 'Charles Leclerc', 'points_scoring_races': 18}, {'full_name': 'Sergio Pérez', 'points_scoring_races': 16}... >

From the results, we can see that Lewis Hamilton, Carlos Sainz and Lando Norris each had 20 races where they scored points.

We did this by only counting the race result if the points_scored was more than 0:

points_scoring_races=Count("driver", filter=Q(points_scored__gt=0))

What is Q?

Q allows you to express a condition for filtering data (docs). You can also use them in queryset filters to combine multiple conditions with AND / OR logic.

7. ExpressionWrapper Example

Show me the age of each driver.

In this example, I will show you how to use Django's .annotate() to calculate a person's age from their date of birth.

The database contains the date of birth in the database. We can use .annotate() to use the date of birth to give the driver's age in years.

Our annotation needs to perform some arithmetic. We need to find the difference between today's date and the driver's date of birth and convert it into years.

The problem is, .annotate() only accepts instances of the Expression class. Recall that in the names example, we had to use Value(" ") just to insert a space between names.

We need to express our calculation of the driver's age as an instance of Expression. All the classes we have been using so far: Count, Value, Concat etc are subclasses of Expression.

There are more Expression subclasses we can use to calculate the driver's age.

We are going to use ExpressionWrapper.

This is the query. Credit to LowerDeez on Github, whose code (source) this query has been adapted from.

import datetime
from django.db.models import ExpressionWrapper, DateField, IntegerField, F
from django.db.models.functions import ExtractDay

driver_ages = (
    Driver.objects.annotate(
        full_name=full_name_expression,
        age=ExpressionWrapper(
            ExtractDay(
                ExpressionWrapper(
                    Value(datetime.datetime.now()) - F("date_of_birth"),
                    output_field=DateField(),
                )
            )
            / 365.25,
            output_field=IntegerField(),
        ),
    )
    .order_by("-date_of_birth")
    .values_list("full_name", "age")[:10]
)

What is happening here? Let's start in the middle and work our way out.

In this ExpressionWrapper, we find the difference between today's date and the driver's date of birth:

ExpressionWrapper(
    Value(datetime.datetime.now()) - F("date_of_birth"),
    output_field=DateField(),
)

If you subtract two dates in Python, you get a timedelta object. We can extract the number of days out of the delta using ExtractDay.

ExtractDay(
    ExpressionWrapper(
        Value(datetime.datetime.now()) - F("date_of_birth"),
        output_field=DateField(),
    )
)

Finally, we can divide the days by 365.25 to get the years passed:

age=ExpressionWrapper(
    ExtractDay(
        ExpressionWrapper(
            Value(datetime.datetime.now()) - F("date_of_birth"),
            output_field=DateField(),
        )
    )
    / 365.25,
    output_field=IntegerField(),
)

When using ExpressionWrapper, you must specify a value for output_field. This specifies how the result will be stored in the queryset.

What is F?

F expressions are used to represent database fields inside expressions (docs).

Conclusion

Django developers can use .annotate() to add fields to querysets and/or group the data.

Annotations are different to aggregations, as annotations are applied to every result in the queryset.

Data can be grouped by applying values() to the queryset before applying .annotate().

Annotations are powerful because they can create complex queries using multiple tables. The alternative is making multiple queries and writing your own code to fuse the data together. Annotations can handle the complexity in a single step.

To learn how to annotate querysets, you must also learn how to use expressions. In this tutorial, we covered Count, Sum, Value, Case, When and ExpressionWrapper. Functions give us more flexibility. We used Concat and ExtractDay in this tutorial.