how the profiler works¶

Basic Ideas¶

A N+1 query is one which runs the same query (with different params) for the N related objects - one for each N - in a loop

The best way to know if a code path is making N+1 calls is to capture the stack trace & the query, and see if the same stack trace & query combination appears again.

We can consider a (stack trace, query) as an aggregate, and maintain a map of this aggregate to count

We should segregate the stack trace into (application, django) stack trace. An application stack trace is useful to be displayed, while a django stack trace is not

Can we do anything useful with the django stack trace? Does it give us any useful insights - if not, we should discard it (Surprise: It does :-) )

Where do we store this data? Maybe a thread-local? Lets call this data collector module

For us to capture stack trace & the query, we have to hook into django to call our data collector when a query is executed

If we can hook into django to call our data collector when it executes any query, we can also collect other interesting properties, like exact sql duplicates, row count, and time taken to run a query

Does Django has hooks for us to execute some function when a query is executed?

Once we have the above two pieces figured out, we have to start collecting this data when a request happens, and stop when the request gets finished, ie. figure out the boundaries of profiling

A middleware seems like a good boundary, but that would limit us to just requests.

A context manager seems like a more generic boundary, and a django middleware can then just call the context manager. This would allow us to use the profiler from command line

Once we have this data, where should the data be displayed about the stack trace & the query

If it was called as part of context manager, user would know what to do with the data

If it was called as part of a request, chrome plugin seems like a good place for displaying this data. Middleware can set the data in the headers, and the chrome plugin should be able to read that, and display it in the plugin

Implementation details¶

This part is divided by the package that answers the four question/idea discussed above

1. query_profiler_storage¶

github link

This package has a data_collector module where we define a thread-local which exposes three functions:

enter_profiler_mode: Just sets the profiler to on state

exit_profiler_mode: Turns off the profiler, and return the profiled data that has been collected since the start of enclosing start block

add_query_profiled_data: If the profiler is on, start collecting data in its thread-local

We have defined our data models in the __init__.py file. All the bookkeeping code happens in these models, in the python magic functions like the __add__ ones.

For capturing the stack trace, and grouping them into (application, django) stack trace, we have stack_tracer module

We are trying to use the django stack trace to figure out if the query is happening because of a forward or a reverse relationship, which helps us to know if this could have been avoided by a select_related/prefetch_related.

This is happening in the django_stack_trace_analyze module. We are trying to analyze django stack trace, and see if we can find some useful known pattern

2. django¶

github link

To get a hook from django when it executes a query, that part is done in the django module. We are using the fact that django provides a way for us to pass a DATABASE[ENGINE] in the settings.py file, as a string.

There are many open source projects which use this hook provided by django, to add some features when connecting to databases:

django-postgres-readonly

django-postgrespool

django-sqlserver

custom database backends

All the above packages have the same part about the ENGINE setting - the package has a base.py and __init__.py file. Looks like, this requirement is coming from django code.

To hook into the django query execution model, all the database in django have a common CursorWrapper implementation. This cursor is the last point where we have python/django code. After this layer, the code is handed to the various database drivers

We change the cursorWrapper to our implementation in the module cursor_wrapper_instrumentation.py. We use a mixin module database_wrapper_mixin.py to do it once for all database, and configure this mixin for each database

In case you are interested to learn about various layers in django, see this amazing talk by James Bennett. Watch it even if you don’t use the profiler :-)

3. client¶

github link

In the above two modules, we already have all the machinery for the profiler. The one thing that is remaining is to set the boundaries of the profiler - by calling the enter_profiler_mode and exit_profiler_mode functions. That is exactly what the context manager does.

The middleware module just calls the context manager, and sets the headers which the chrome plugin expects

4. chrome plugin¶

github link

This is a different project in the repo. All it does, is see if the headers in the request have the headers which the django query profiler sets. If it has, it parses the response, and add it to table in its devtools panel