Thought I’d highlight a common problem I’ve run into a few times with a few of my customers using Entity Framework.
I spent some time today with a customer debugging a very, very slowly process using Entity Framework operation. The customer was running a long order processing task involving an order with many thousands of order items plus a boat load of child items. This task is pretty massive, but it was taking 6+ hours to complete. Yikes. Lots of items for sure, but there’s no reason this should take hours or even more than a few minutes.
Now some people are very quick to blame EF for bad performance and while there may be something to that in some situations, I find that very frequently a few minor adjustments in code can fix serious performance issues. This was one of those cases.
An Example of Large Order Processing
The issue for this customer dealt with processing very large Sales Orders that involves looking up customers and order ids as part of the initial pre-processing operations. We’re using business objects in this scenario but the business objects essentially host an Entity Framework dbContext and use it for the business object methods.
In the code below the business objects are used to load up instances of orders and customers for roughly 16,000 sales orders (modified for EF specifics and removed some additional processing code after the load operations to keep the code relevant):
private void LoadOrderIDsToProcess()
{
// contains an initialized dbContext instance dbContext
BusOrder orderBO = new BusOrder();
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
//var order = orderBO.Load(orderID.OrderID);
var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID);
orderID.CustomerID = order.CustomerID;
}
orderIDsToProcess.OrderBy(x => x.CustomerID);
BusCustomer customerBO = new BusCustomer();
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
//var customer = customerBO.Load(orderID.CustomerID);
var customer = customerBO.Context.Customers.FirstOrDefault(c=> c.CustomerID == orderID.CustomerID);
if (customer == null)
orderID.BillingTypeID = 0;
else
orderID.BillingTypeID = customer.BillingTypeID ?? 0;
}
}
The process basically creates a single business object/dbContext and then proceeds to iterate over each of the sales order items and collects the orderIDs. Then the same process is roughly repeated to collect all the customer ids from this single order that already lives in memory. Assuming 16,000 sales order records, this bit of code would generate 32,000 queries to retrieve the child customers and orders. That’s a lot but really this shouldn’t take very long to process. Yet the original code ran for more than 6 hours.
What happens is that processing starts fast, but then slowly starts slowing down, getting slower and slower as the loop count goes up. By the time we get to the last few items in the second loop there’s up to a 4 second delay between each iteration of the loop.
The first thought we had is that this was slow because of SQL, but checking the SQL Profiler logs it was easy to see that the queries were operating in the nearly immeasurable millisecond range even once the loop starts slowing down. We could see however that the interval between database queries was increasing drastically.
So what’s going on here?
Watch your DbContext and Change Tracking!
The problem here is Entity Framework’s Change tracking. The code performs 32,000+ SQL load operations and then loads those 32,000 result records into the active dbContext. At first this isn’t a problem – the first few hundred records go fast, but as the context accumulates more and more entities to track both memory usage goes up and EF ends up having to look through the list of objects already in memory before going out and grabbing the next record.
In short the problem is dbContext bloat. dbContext is meant to be used as a Unit of Work, which generally means small chunks of work and a few records in a context. In this case the context is getting bloated with a lot of records – and worse records that we have no intention of updating.
There are a few simple solutions to this problem:
- Recreate the dbContext/Business object inside of the loop for each iteration
- Turn off change tracking for the dbContext instance
Recreate the dbContext
The first thing I tried is to simply move the business object (and therefore the dbContext) instantiation inside of the loop in both operations:
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
BusOrder orderBO = new BusOrder();
var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID);
orderID.CustomerID = order.CustomerID;
}
Immediately re-running those same 6+ hour queries reduced the processing time to a mere 2 minutes.
Note some people are hesitant to create new instances of dbContext because it’s supposed to be slow. While the first instantiation of a large dbContext (like the one used here) can be very slow, subsequent instantiation is not. Yes there’s still some overhead over not creating one, but it’s not a major hit. You should not be afraid to create multiple dbContexts or re-create an existing dbContext to provide isolation or clear out change state if you need it.
One other real issue with dbContext is that it has no way to clear out the change tree. Even after you call .SaveChanges() EF maintains the internal entities it has already loaded. There’s no way to release this other than creating a new instance. So if you’re dealing with operations in loops recreation makes good sense.
Turn off Change Tracking
If you know that the context you are using for an iterative process doesn’t need to actually write changes, an even more effective way to speed up performance is to turn off change tracking on the context instance. The context’s Configuration object has a AutoDetectChangesEnabled property for just this use case.
Using this property I can now write the code like this:
// contains an initialized dbContext instance dbContext
BusOrder orderBO = new BusOrder();
orderBO.Context.Configuration.AutoDetectChangesEnabled = false;
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID);
orderID.CustomerID = order.CustomerID;
}
Running with this code for both the order and customer iterations reduced the processing times even further to 40 seconds! 40 seconds from over 6+ hours – that’s quite an optimization for essentially adding or moving a single line of code!
Now there is a single context performing all those Load() operations, but because change tracking is off, EF doesn’t keep track of the loaded entities and the change tree and all that tracking overhead that was the apparent problem in the slowdown is no longer an issue. The code is as fast as you would expect it to be.
As long as you know that you’re not updating data with the context, this is probably the best solution and in fact I would encourage using this option frequently if you know you’re pulling data for read only operations.
IQueryable.AsNoTracking()
Thanks to Matt Brooks for reminding me in the comments of this feature
The IQueryable interface that you use to retrieve data from the DB can also be used to retrieve entities and lists without tracking on a per request basis. For example, I could also explicitly make the order and customer retrieval work without tracking changes:
var cust = orderBO.Context.Orders.AsNoTracking().FirstOrDefault(orderID.OrderID);
This is even more useful when retrieving large lists of objects, which by default are all tracked when retrieved from any dbContext’s dbTable instance.
.AsNoTracking() allows you per query control rather than per context control and it’s actually a good idea to use this on any read-only queries as it improves query result generation performance. Depending on how you use entities inside of your business layer this may or may not be a good performance improvement for you or not.
So as you can see there are a number of options available to keep the change tree minimal. If you do need to update data, then creating new context instances and re-creating the context for each iteration is the right choice. Otherwise turning off change tracking either on the context or on individual queries can provide even better performance as you don’t have any overhead creating new context instances.
When debugging slow performance for Entity Framework this is one of the first problems to check for. FWIW, I’ve written about a similar case last year when I looked at a large amount of SQL Insertions which dealt with the very same issues. It’s an easy thing to miss especially if you’re testing an application against small data sets initially and then run into large data sets later as the app goes live.
Avoid Iteration Load Loops Altogether
Another maybe more obvious solution is to avoid iteration load loops altogether, by removing the code that repeatedly loads data from the SQL backend in a loop. Iteration after all is something that LINQ can provide and if you can integrate that into the query to retrieve the initial looping data you can avoid the iteration/load loop in the first place.
In the example above, you could run a single query that returns all the records in the first place in a single query then loop over the in-memory list which essentially trades memory for IO operations. It turns out for this customer there is code not listed here that doesn’t allow for this scenario, but it’s a very valid point: Avoid individually loading ‘records’ when you can use a bulk selection using a single LINQ query instead.
The idea behind this is that you pre-load all the data, and then iterate over the list in memory which doesn’t have to deal with Entity Framework lookups of the data. In fact I would argue that this should probably your first avenue of approach, as reducing the amount of load on the data server certainly would also be beneficial. Depending on the size of the data returned, you’d essentially trade memory for IO access and you’d sidestep the whole issue discussed in this post.
In many cases however, you may still be stuck with an iteration requirement. Places were the lists originate outside of LINQ to SQL, or if you need to read, then update the data.
In the end iteration loops over EF entities should not be a common scenario, but rather a relatively rare edge case.
dbContext and Lifetime
Clearly this is an example where the lifetime of a dbContext is very important. I’m always surprised when I see guidance for Web applications that insist on creating a per request dbContext instance. A per request context seems like a terrible thing to me when you think of the dbContext as a unit of work. While per request context may make sense in a lot of common situations, it’ll absolutely screw you for others if you are not careful. This is one of those cases.
Personally I like to control the lifetime of the dbContext and create and dispose it when I’m done with it keeping with the Unit of Work theme that dbContext is based on. As I mentioned dbContext creation is not a real performance concern, and for most of my applications I rather defer the context creation to the business objects that actually need to use it.
Especially given that creating the context in recent versions of EF is not the big, heavy operation it once was in earlier versions it’s no longer a big deal to create new instances of a context when you need to clear out the state of the context for different operations. In most situations it’s not the request that should control the lifetime, but the business objects that work with the context to perform the business operations that should handle this.
A few Thoughts
It’s surprising how frequently I’ve run into the issue of dbContext change tracking bloat both in my own and in customer code. Even though I know about this issue it’s easy to get complacent and forget that there are a few rules you have to live by when using Entity Framework (or most ORMs for that matter). Certainly if you look at the original code I showed, at a glance it makes perfect sense to create a single instance of the context and reuse it for all data operations – why would you re-create it each time? Because that’s one of the rules for EF: Don’t let the context bloat too much or it will become unbearably slow!
Related Topics
Other Posts you might also like