Recently I’ve been working on a project wherein we rewrote a body of code whose architecture was… well, less than ideal. We took a mess of an object model and built something more elegant and structured. The new code design was a huge win. We were left with one major problem. In the process of the redesign we had added another table to our database. Fetching all our objects now required a join, and for some reason this was now taking 4 hours for our 300k item dataset instead of the previous 5 minutes. So our code was much nicer, but a lot slower. Not exactly impressive. After some digging, I found the following:
In SQLAlchemy 0.5 the default (and it seems only) type of join for a relation is a LEFT OUTER JOIN. While this is great for cases where you may have a null relation, it’s not something that’s possible in our data model. Furthermore, it appears the SQLite implementation of this type of join is excruciatingly slow. For our purposes a simple and fast JOIN would be sufficient.
Having identified the problem, I set about trying to figure out how to convince SQLAlchemy’s ORM layer to to issue a standard JOIN. I investigated all sorts of solution including writing custom select queries to load collections, but nothing really seemed as clean as I would have liked it to be.
Someone on #sqlalchemy
on irc.freenode.org
pointed me to a new parameter in the 0.6 relation()
function: innerjoin
So I changed the mapper code which previously looked like:
1
|
|
to:
1
|
|
and SQLAlchemy dutifully started putting out JOINs instead of the previously inefficient SQL code!
This tiny change decreased the execution time of the code on our sample database from around 4 hours down to 2 minutes. A 120x speedup! Great news for me. Overall the execution of the new code is still not quite as fast as I think it could be and I’ll be doing some profiling to investigate exactly why.