Optimizing Django ORM Queries with Percona Monitoring and Management

Optimizing Django ORM Queries

Optimizing Django ORM QueriesDjango is a fantastic tool to quickly create an admin dashboard for managing entities.  But the magic of brevity oftentimes comes at the expense of suboptimal performance. The Django ORM is a coin with two sides.  It simplifies work with complex datasets and makes it easy and more intuitive to create queries. Unfortunately, it obfuscates what’s going on under the hood, leaving an “open door” for inefficient queries to go into your app. This slowdown can be invisible for the first time, but as your app starts to scale, these minor inefficiencies may become a performance problem.

Continuous review is the best practice to catch this type of slowdown before it ever makes its way into production code. When creating the Django project, it’s good to develop a habit of checking whether the queries deliver on the performance targets.  If you want to know what Django does on under the hood, you can use DEBUG mode or even “Django-debug-toolbar.”

But the smarter way is to use Percona Monitoring and Management (PMM) to keep control over how your app works with DB. PMM plays the role of a microscope to identify small issues early that can become a massive problem on a big production scale.  It gives you a general profile of your application queries with detailed metrics, query explains, indexes, and table stats. And, most importantly, it can be run in production to monitor and react on issues that become visible on a big scale.

Let’s Analyze a Simple Example of Using PMM

We have a service to manage IoT sensors in the apartment with the following dependencies Rooms 1 <-> n Sensors. Here we use the admin part of Django only. For illustration, it requires less code and is commonly used for creating small and big back offices.

Let’s install PMM Server and point it on our DB:

from django.db import models
 
 
class Room(models.Model):
   name = models.CharField(max_length=25, blank=False, null=False)
   area = models.IntegerField(null=False)
 
   class Meta:
       managed = True
       db_table = 'rooms'
 
   def __str__(self):
       return self.name
 
 
class Sensor(models.Model):
   room = models.ForeignKey(Room, models.DO_NOTHING, blank=False, null=False)
   name = models.CharField(max_length=25, blank=False, null=False)
 
   class Meta:
       managed = True
       db_table = 'sensors'
 
   def __str__(self):
       return self.name

 

from django.contrib import admin
 
from .models import Room, Sensor
 
 
class RoomAdmin(admin.ModelAdmin):
   list_display = ('name', 'area')
 
 
class SensorAdmin(admin.ModelAdmin):
   list_display = ('name', 'room_desciption')
   search_fields = ('^name',)
 
   def room_desciption(self, obj):
       return '%s (%d m^2)' % (obj.room.name, obj.room.area)
 
 
admin.site.register(Room, RoomAdmin)
admin.site.register(Sensor, SensorAdmin)

So we will have this view:

Django ORM Queries with Percona Monitoring and Management

Now, let’s take a look at the myriad of queries and metrics that PMM organized for us, and find a query that selects values for the sensor’s view.

Query Analytics Dashboard Percona Monitoring and Management

Oh, the first we can find is Room Description which is not lazy joined to Sensors.

I only once opened the sensor’s view for the last five minutes and in this period we can see two query classes related to this view.

The first SELECT … FROM sensors – select sensors ids and names appear one time and the second SELECT … FROM rooms – select Room Description appears four times, ones for each sensor. This is called “The N+1 problem”.

This is probably the most common issue that new users of Django usually face: select related objects in a single query.

Let’s tell Django to retrieve each room’s name and area in the same query. Property list_select_related = True add to SensorAdmin class.
Now, admin.py looks like:

from django.contrib import admin
 
from .models import Room, Sensor
 
 
class RoomAdmin(admin.ModelAdmin):
   list_display = ('name', 'area')
 
 
class SensorAdmin(admin.ModelAdmin):
   list_display = ('name', 'room_desciption')
   search_fields = ('^name',)
   list_select_related = True # < --- select related rooms
 
   def room_desciption(self, obj):
       return '%s (%d m^2)' % (obj.room.name, obj.room.area)
 
 
admin.site.register(Room, RoomAdmin)
admin.site.register(Sensor, SensorAdmin)

Here we can see how the new query for the sensor’s view looks:

PMM Query Analytics: query with select_related

It appears only once per page request.

Let’s investigate this query more deeply about how it executes with the search parameter and what we can improve. We have a property search_fields = (‘^name’,) of SensorAdmin class, which allows us to filter sensors by the first part of names.

Filtered list of sensors
And what gets our attention in query details? It is the ratio of “Rows examined” to “Rows sent.” We expect “Rows sent” to be close to “Rows examined” because otherwise, it means that a query handles many rows (“examined”) which are not used in the final result set (“sent”). It means wasted CPU cycles and even unnecessary IOs if rows are not in memory.

PMM: Query Details

So here we have a room for improvement that speeds up search on a big dataset. There is no general rule when, and how, we need to add an index, as it hugely depends on data type, variability, on insert/select ratio and also on application usage at all.

But let’s add an index here, assuming that it is appropriate.

Let’s add db_index=True to Sensor.name property and apply migration:

from django.db import models
 
 
class Room(models.Model):
   name = models.CharField(max_length=25, blank=False, null=False)
   area = models.IntegerField(null=False)
 
   class Meta:
       managed = True
       db_table = 'rooms'
 
   def __str__(self):
       return self.name
 
 
class Sensor(models.Model):
   room = models.ForeignKey(Room, models.DO_NOTHING, blank=False, null=False)
   name = models.CharField(db_index=True, max_length=25, blank=False, null=False)
   # Let’s add index here --^
 
   class Meta:
       managed = True
       db_table = 'sensors'
 
   def __str__(self):
       return self.name

As a result, we have reduced the amount of examined rows from five to three per row sent:

PMM: Query Details

So, in conclusion: with the continuous review, we can see the trends and effects of our optimizations on PMM charts even on small datasets that we usually use during development, and keep the performance up of production deployment.


by Andrii Skomorokhov via Percona Database Performance Blog

Comments