Delegates Part 9
Delegates Part 9
The experiment I conducted was to run a fixed number of queries (5000 in this case) but to break them up
so that the compiled query was reused a decreasing amount. The first run is the "best" 1 batch of 5000
selects all using the compiled query. Then 2 batches of 2500, and so on down to 5000 batches of 1. As
a control I also run the uncompiled case at each step expecting of course that it makes no difference.
Note the output indicates we selected a total of 25000 rows of data -- that is 5 per select as expected.
Here are the raw results:
And there you have it. Even at 2 uses the compiled query still wins but at 1 use it loses. In fact, the
magic number for this particular query is about 1.5 average uses to break even. But why? And how
might it change?
Well, as has been observed in the comments, Linq query compilation isn't like regular expression
compilation. In fact compiling the query doesn't do anything that isn't going to have to happen anyway. In
fact, actually creating the compiled query with Query.Compile hardly does anything at all, it's all deferred
until the query is run just as it would have been had the query not been compiled. So what is the
overhead? Why is it slower at all? And what's the point of it?
Well the main purpose of that compiled query object is to have an object, of the correct type, that also has
the correct lifetime. The compiled query can live across DataContexts, in fact it could potentially live for
the entire life of your program. And since it has no shared state in it, it's thread-safe and so forth. It
exists to:
1) Give the Linq to SQL system a place to store the results of analyzing the query (i.e. the actual SQL
plus the delegate that will be used to extract data from the result set)
2) Allow the user to specify the "variable parts" of the query. The most common case isn't that the query
is exactly the same from run to run, usually it's "nearly" the same... That is it's the same except that
perhaps the search string is different in the where clause, or the ID being fetched is different. The shape
is the same. Creating a delegate with parameters allows you to specify which things are fixed and which
things are variable.
Now there was some debate about how to make compiled queries durable, automatically caching them
was considered, but this was something I was strongly against. Largely because of the object lifetime
issues it would cause. First, you would have to do complicated matching of a created query against
something that was already in the cache -- something I'd like to avoid. Secondly you have to decide
where to store the cache, if you associate it with the DataContext then you get much less query re-use
because you only get a benefit if you run the same query twice in the same data context. To get the most
benefit you want to be able to re-use the query across DataContexts. But then, do you make the cache
global? If you do you have threading issues accessing it, and you have the terrible problem that you don't
know when is a good time to discard items from the cache. Ultimately this was my strongest point, at the
Linq data level we do not know enough about the query patterns to choose a good caching policy, and, as
I've written many times before, when it comes to caching good policy is crucial. In fact, analogously, we
had to make changes in the regular expression caching system back in Whidbey precisely because we
were seeing cases where our caching assumptions were resulting in catastrophically bad performance
(Mid Life Crisis due to retained compiled regular expressions in our cache) -- I didn't want to make that
mistake again.
So that's roughly how we end up at our final design. Any Linq to SQL user can choose how much or how
little caching is done. They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a
static variable forever) or a complicated recycling method depending on their needs. Usually the simple
choice is adequate. And they can easily choose which queries to compile and which to just run in the
usual manner.
Let's get back to the overhead of compiled queries. Besides the one-time cost of creating the delegate
there is also an little extra delegate indirection on each run of the query plus the more complicated thing
we have to do: since the compiled query can span DataContexts we have to make sure that the
DataContext we are being given in any particular execution of a compiled query is compatible with the
DataContext that was provided when the query was compiled the first time.
Other than that the code path is basically the same, which means you come out ahead pretty quickly.
This test case was, as usual, designed to magnify the typical overheads so we can observe them. The
result set is a small number of rows, it is always the same rows, the database is local, and the query itself
is a simple one. All the usual costs of doing a query have been minimized. In the wild you would expect
the query to be more complicated, the database to be remote, the actual data returned to be larger and
not always the same data. This of course both reduces the benefit of compilation in the first place but
also, as a consolation prize, reduces the marginal overhead.
In short, if you expect to reuse the query at all, there is no performance related reason not to compile it.