LINQ - watchout for slow queries | Software Craftsperson

LINQ or Language Integrated Query is great for application developers to develop applications that interact with databases. But remember it is a tool and like every tool, you have to use it wisely to achieve the best results.

I currently maintain a web based application that is meant to allow job scheduling, and provide users a way to interact with the jobs that are currently running too. Some of the features include, stop the job that is running, skip one among the several jobs, restart the job, resume from a particular section in the job etc. It is a very rich web application. And we have used LINQ almost entirely for all the interactions between the database layer and the application.

One of the tables that we primarily interact with and which forms the backbone of the application is badly designed. We know it is, but we couldn’t do much as yet, because we inherited this application from another team and they were the ones who chose the the design and things. Over the years, we have successfully cleaned up and improved the code base in several ways. But there is a lot more to do. We have been fighting with deadlines that come along new requirements and trying to solve engineering problems that could have been avoided had they made wise decisions earlier.

The primary table that I was talking about has a primary key, which is a GUID. This is among the worst decisions ever made. A clustered index, is a sorted index and you pick a GUID which when sorted, would be in a weird order that only the person who wrote the GUID generation algorithm would be able to explain. But this is what makes it a challenge for us. And to make it more challenging, this application is being used by more than half of the employees in the company and it runs important data update systems which rely on it being available and functional throughout. So that makes work fun.

LINQ most of the time creates pretty decent SQL for which the query optimizer is able to pick on the most optimal execution plan. But some times, when the data in one of the tables in the joins involved is huge, and if those joins are nested loops joins, then you’ve got a problem. Your query optimizer isn’t able to do much.

Of course, you shouldn’t worry too much until you hit a problem, otherwise you are just increasing your development time and you are thinking of cases that may not even occur. So we hit a case where certain people couldn’t load their Job history for their product (for a product identifier). Why? The LINQ generated a query for which the Query optimizer chose a less than optimal execution plan. But why only for a particular productId? Because the number of jobs owned by that product is much larger when compared to others. So the query optmizer got confused, picked the wrong plan and the result: the query takes more than 90 seconds to execute.

To add to the problem, we are doing all our reads using the strictest isolation level, which is Serializable isolation level. What would have been ideal is to enable Snapshot isolation, which has to be done at a database level. Apparently enabling this involves, taking the database offline.

But over the course of this exercise, I must admit that I learned more about the following:

LINQ to SQL conversion
Different ways the database looks up your Index - seek and scan
Different types of JOIN algorithms used by the DB - nested loops, hash and merge joins
How useful LINQpad can be, especially when you are working on writing LINQ snippets
How to hover over the IQueryable while debugging the code to fetch the query behind which you can then use in whatever SQL client you are using
I use SQL Server Management Studio and the execution plan and other statistics are priceless

That’s all for now. Don’t blindly trust LINQ, EVER.

Some useful links that helped me: