Rick Strahl's Web Log

Wind, waves, code and everything in between...
ASP.NET • C# • HTML5 • JavaScript • AngularJs
Contact   •   Articles   •   Products   •   Support   •   Search
Ad-free experience sponsored by:
ASPOSE - the market leader of .NET and Java APIs for file formats – natively work with DOCX, XLSX, PPT, PDF, images and more

Gotcha: Entity Framework gets slow in long Iteration Loops


snailThought I’d highlight a common problem I’ve run into a few times with a few of my customers using Entity Framework.

I spent some time today with a customer debugging a very, very slowly process using Entity Framework operation. The customer was running a long order processing task involving an order with many thousands of order items plus a boat load of child items. This task is pretty massive, but it was taking 6+ hours to complete. Yikes. Lots of items for sure, but there’s no reason this should take hours or even more than a few minutes.

Now some people are very quick to blame EF for bad performance and while there may be something to that in some situations, I find that very frequently a few minor adjustments in code can fix serious performance issues. This was one of those cases.

An Example of Large Order Processing

The issue for this customer dealt with processing very large Sales Orders that involves looking up customers and order ids as part of the initial pre-processing operations.  We’re using business objects in this scenario but the business objects essentially host an Entity Framework dbContext and use it for the business object methods.

In the code below the business objects are used to load up instances of orders and customers for roughly 16,000 sales orders (modified for EF specifics and removed some additional processing code after the load operations to keep the code relevant):

private void LoadOrderIDsToProcess()
{
    // contains an initialized dbContext instance dbContext
    BusOrder orderBO = new BusOrder();

    foreach (OrderIDsToProcess orderID in orderIDsToProcess)
    {
        //var order = orderBO.Load(orderID.OrderID); 
        var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID); 
        orderID.CustomerID = order.CustomerID; 
    }
    orderIDsToProcess.OrderBy(x => x.CustomerID);
            
    BusCustomer customerBO = new BusCustomer();
    foreach (OrderIDsToProcess orderID in orderIDsToProcess)
    {
        //var customer = customerBO.Load(orderID.CustomerID);
        var customer = customerBO.Context.Customers.FirstOrDefault(c=> c.CustomerID == orderID.CustomerID);
        if (customer == null)
            orderID.BillingTypeID = 0;
        else                                     
            orderID.BillingTypeID = customer.BillingTypeID ?? 0;
    }
}

The process basically creates a single business object/dbContext and then proceeds to iterate over each of the sales order items and collects the orderIDs. Then the same process is roughly repeated to collect all the customer ids from this single order that already lives in memory.  Assuming 16,000 sales order records, this bit of code would generate 32,000 queries to retrieve the child customers and orders. That’s a lot but really this shouldn’t take very long to process. Yet the original code ran for more than 6 hours.

What happens is that processing starts fast, but then slowly starts slowing down, getting slower and slower as the loop count goes up. By the time we get to the last few items in the second loop there’s up to a 4 second delay between each iteration of the loop.

The first thought we had is that this was slow because of SQL, but checking the SQL Profiler logs it was easy to see that the queries were operating in the nearly immeasurable millisecond range even once the loop starts slowing down. We could see however that the interval between database queries was increasing drastically.

So what’s going on here?

Watch your DbContext and Change Tracking!

The problem here is Entity Framework’s Change tracking. The code performs 32,000+ SQL load operations and then loads those 32,000 result records into the active dbContext. At first this isn’t a problem – the first few hundred records go fast, but as the context accumulates more and more entities to track both memory usage goes up and EF ends up having to look through the list of objects already in memory before going out and grabbing the next record.

In short the problem is dbContext bloat. dbContext is meant to be used as a Unit of Work, which generally means small chunks of work and a few records in a context. In this case the context is getting bloated with a lot of records – and worse records that we have no intention of updating.

There are a few simple solutions to this problem:

  • Recreate the dbContext/Business object inside of the loop for each iteration
  • Turn off change tracking for the dbContext instance

Recreate the dbContext

The first thing I tried is to simply move the business object (and therefore the dbContext) instantiation inside of the loop in both operations:

foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
    BusOrder orderBO = new BusOrder();        
    var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID); 
    orderID.CustomerID = order.CustomerID; 
}

Immediately re-running those same 6+ hour queries reduced the processing time to a mere 2 minutes.

Note some people are hesitant to create new instances of dbContext because it’s supposed to be slow. While the first instantiation of a large dbContext (like the one used here) can be very slow, subsequent instantiation is not. Yes there’s still some overhead over not creating one, but it’s not a major hit. You should not be afraid to create multiple dbContexts or re-create an existing dbContext to provide isolation or clear out change state if you need it.

One other real issue with dbContext is that it has no way to clear out the change tree. Even after you call .SaveChanges() EF maintains the internal entities it has already loaded. There’s no way to release this other than creating a new instance. So if you’re dealing with operations in loops recreation makes good sense.

Turn off Change Tracking

If you know that the context you are using for an iterative process doesn’t need to actually write changes, an even more effective way to speed up performance is to turn off change tracking on the context instance. The context’s Configuration object has a AutoDetectChangesEnabled property for just this use case.

Using this property I can now write the code like this:

// contains an initialized dbContext instance dbContext
BusOrder orderBO = new BusOrder();
orderBO.Context.Configuration.AutoDetectChangesEnabled = false;

foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{        
    var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID); 
    orderID.CustomerID = order.CustomerID; 
}

Running with this code for both the order and customer iterations reduced the processing times even further to 40 seconds!  40 seconds from over 6+ hours – that’s quite an optimization for essentially adding or moving a single line of code!

Now there is a single context performing all those Load() operations, but because change tracking is off, EF doesn’t keep track of the loaded entities and the change tree and all that tracking overhead that was the apparent problem in the slowdown is no longer an issue. The code is as fast as you would expect it to be.

As long as you know that you’re not updating data with the context, this is probably the best solution and in fact I would encourage using this option frequently if you know you’re pulling data for read only operations.

IQueryable.AsNoTracking()

Thanks to Matt Brooks for reminding me in the comments of this feature

The IQueryable interface that you use to retrieve data from the DB can also be used to retrieve entities and lists without tracking on a per request basis. For example, I could also explicitly make the order and customer retrieval work without tracking changes:

var cust = orderBO.Context.Orders.AsNoTracking().FirstOrDefault(orderID.OrderID);

This is even more useful when retrieving large lists of objects, which by default are all tracked when retrieved from any dbContext’s dbTable instance.

.AsNoTracking() allows you per query control rather than per context control and it’s actually a good idea to use this on any read-only queries as it improves query result generation performance. Depending on how you use entities inside of your business layer this may or may not be a good performance improvement for you or not.

So as you can see there are a number of options available to keep the change tree minimal. If you do need to update data, then creating new context instances and re-creating the context for each iteration is the right choice. Otherwise turning off change tracking either on the context or on individual queries can provide even better performance as you don’t have any overhead creating new context instances.

When debugging slow performance for Entity Framework this is one of the first problems to check for. FWIW, I’ve written about a similar case last year when I looked at a large amount of SQL Insertions which dealt with the very same issues. It’s an easy thing to miss especially if you’re testing an application against small data sets initially and then run into large data sets later as the app goes live.

Avoid Iteration Load Loops Altogether

Another maybe more obvious solution is to avoid iteration load loops altogether, by removing the code that repeatedly loads data from the SQL backend in a loop. Iteration after all is something that LINQ can provide and if you can integrate that into the query to retrieve the initial looping data you can avoid the iteration/load loop in the first place.

In the example above, you could run a single query that returns all the records in the first place in a single query then loop over the in-memory list which essentially trades memory for IO operations. It turns out for this customer there is code not listed here that doesn’t allow for this scenario, but it’s a very valid point: Avoid individually loading ‘records’ when you can use a bulk selection using a single LINQ query instead.

The idea behind this is that you pre-load all the data, and then iterate over the list in memory which doesn’t have to deal with Entity Framework lookups of the data. In fact I would argue that this should probably your first avenue of approach, as reducing the amount of load on the data server certainly would also be beneficial. Depending on the size of the data returned, you’d essentially trade memory for IO access and you’d sidestep the whole issue discussed in this post.

In many cases however, you may still be stuck with an iteration requirement. Places were the lists originate outside of LINQ to SQL, or if you need to read, then update the data.

In the end iteration loops over EF entities should not be a common scenario, but rather a relatively rare edge case.

dbContext and Lifetime

Clearly this is an example where the lifetime of a dbContext is very important. I’m always surprised when I see guidance for Web applications  that insist on creating a per request dbContext instance. A per request context seems like a terrible thing to me when you think of the dbContext as a unit of work. While per request context may make sense in a lot of common situations, it’ll absolutely screw you for others if you are not careful. This is one of those cases.

Personally I like to control the lifetime of the dbContext and create and dispose it when I’m done with it keeping with the Unit of Work theme that dbContext is based on. As I mentioned dbContext creation is not a real performance concern, and for most of my applications I rather defer the context creation to the business objects that actually need to use it.

Especially given that creating the context in recent versions of EF is not the big, heavy operation it once was in earlier versions it’s no longer a big deal to create new instances of a context when you need to clear out the state of the context for different operations. In most situations it’s not the request that should control the lifetime, but the business objects that work with the context to perform the business operations that should handle this.

A few Thoughts

It’s surprising how frequently I’ve run into the issue of dbContext change tracking bloat both in my own and in customer code. Even though I know about this issue it’s easy to get complacent and forget that there are a few rules you have to live by when using Entity Framework (or most ORMs for that matter). Certainly if you look at the original code I showed, at a glance it makes perfect sense to create a single instance of the context and reuse it for all data operations – why would you re-create it each time? Because that’s one of the rules for EF: Don’t let the context bloat too much or it will become unbearably slow!

Related Topics

Posted in Entity Framework  C#  .NET  

The Voices of Reason


 

Matt Brooks
December 21, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Regarding your comment about per request option for disabling change tracking, have you seen the IQueryable.AsNoTracking method for EF?

http://msdn.microsoft.com/en-us/library/gg679352%28v=vs.103%29.aspx

Rick Strahl
December 21, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Thanks Matt, for reminding me. Totally forgot about .AsNoTracking() for individual queries (even though I use that all the time - duh!). Added a section to the post. Thanks!

wydok
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

You may also want to consider the fact that your loop runs a SQL statement to get the first order by id for each order in the database. You can preload the orders, as long as you don't mind swapping I/O with RAM usage:

// load all the orders being proccessed
var orders = orderBO.Context.Orders.Where(o => orderIDsToProcess.Select(p => p.OrderID).Contains(o.OrderID).ToList();

foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
var order = orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID);
orderID.CustomerID = order.CustomerID;
}

Rick Strahl
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

@wydok - yup, definitely, pre-loading certainly sounds better than running those 16,000 queries in a loop. Even more so, they could pre-select just the 2 or 3 fields that are actually needed instead of the whole record.

As I mentioned the code I posted is somewhat simplified to just point at the looping construct and slow iteration speed - there are a few other criteria that make pre-selection more tricky than the query you suggest :-)

But it's a great point - worth mentioning in the post in fact. I'll add something to the fact. Thanks!

While in this case there might actually be a different solution that avoids that heavy EF iteration altogether, there are valid scenarios where iteration is required and this problem will creep up.

Alex B
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

I don't know.. I still don't use EF at all for data processing. Data access layer is still good. Everything fast and predictable. Am I nuts?...

Rick Strahl
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

@Alex - Nope. If whatever you're using works for you by all means do. EF has gotten lots better over the years to the point where it's stable and provides good performance IMHO. But you do have to be aware of quirks - just like you do if you write manual SQL which is just as hard if not harder to tune right as well.

There are so many choices these days. I use a lot of MongoDb these days for example which works so much better for many of the data scenarios I'm working with.

Albin Sunnanbo
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Thanks for the reminder about turning off change tracking.

However, there are some major improvements left in this example.
First of all, the concept of making 16000 SQL queries in a loop is still rather naive.

Instead of
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
    //var order = orderBO.Load(orderID.OrderID); 
    var order = orderBO.Context.Orders.FirstOrDefault(o=> o.OrderID == orderID.OrderID); 
    orderID.CustomerID = order.CustomerID; 
}

you could load all orders in a single query
var orderIDsToProcess_id = orderIDsToProcess.Select(o => i.OrderID).ToList(); // projects only the ID:s
var allOrders = orderBO.Context.Orders.Where(o => orderIDsToProcess_id.Contains(o.OrderID));
// To get efficient lookups we create a dictionary based on the OrderID
var allOrdersDictionary = allOrders.ToDictionary(o => o.OrderID);
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
    var order = allOrdersDictionary[orderID.OrderID]; 
    orderID.CustomerID = order.CustomerID; 
}

the Contains looks a bit backward, but is translated in to IN(...) in SQL.
Beware that before EF6 the Contains slowed down on large lists. If you are stuck on EF<=5, page the list into batches of at most 1000 items.
The code above should get your execution time to at most a few seconds.
This is the recommended approach if you want to use more properties from the Order, or update the Order object. However in this code we only use the CustomerID property and can select only that property, greatly reducing the amount of data sent over the wire from the SQL server.
var orderIDsToProcess_id = orderIDsToProcess.Select(o => i.OrderID).ToList(); // projects only the ID:s
var allOrders = orderBO.Context.Orders.Where(o => orderIDsToProcess_id.Contains(o.OrderID));
// To get efficient lookups we create a dictionary based on the OrderID
var customerIdByOrderId = allOrders.ToDictionary(o => o.OrderID, o => o.CustomerID);
foreach (OrderIDsToProcess orderID in orderIDsToProcess)
{
    var customerID = customerIdByOrderId[orderID.OrderID]; 
    orderID.CustomerID = customerID; 
}

This does not return an entity at all and you don't need to bother with change tracking at all.
This should definitely get you into the sub second range.

Albin Sunnanbo
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

I would also like to point out the importance of disposing your EF Contexts since EF keeps the database connection open until the Context is disposed.

In your example where you recreate your context
BusOrder orderBO = new BusOrder();
inside the loop you might end up creating lots of concurrent database connections instead of reusing open connections from the connection pool.
It might of course being present in your code, but it does not show up in your example.

Frans Bouma
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Not to pimp my own work, but this is a standard mistake: it's a select N+1 within a loop, which could have been caught during development with tools like ORM Profiler (http://www.ormprofiler.com). Use the tools which are there for you to avoid mistakes which are not obvious.

Entity Framework is rather slow with fetching large sets though (see my benchmarking work here and results: https://github.com/FransBouma/RawDataAccessBencher ). I profiled the fetches and wondered what it is that's slow, apparently it's proxy management for related entities when fetching the set with change tracking. NHibernate has the exact same problem (it's even slower).

Interestingly enough, other sessoin/context based ORMs like Linq to Sql don't have this problem. It looks like a serious design flaw within EF and NH. I don't know whether they've fixed this in EF7 (as they rewrote a lot of it), I hope for EF users they have!

I agree with wydok: fetching sets of projected data to process elements in memory is always more efficient. If possible though, perform bulk operations on the DB directly (so update queries with expressions). EF by default doesn't support them (other ORMs do) but there are extensions for EF which add these features.

vk
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Check NHibernate, it has request batching feature :) otherwise use hint by @wydok.

Also, instead of loading entire order it makes sense to use projection to load only OrderId and CustomerId.

Dan H
December 22, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

I had this same exact problem recently. I had a large batch processing job that gets called via a web service (we'll call this the Batch Request) and was using a per request context. I didn't want to get rid of the per request context so grudgingly, I had that batch job call another web service (we'll call this the Singular Request) with the id of the object I wanted processed. I hated this approach BUT, I then realized that it might work in my favor by calling the singular requests async in batches of 20. Two benefits came of of it - I was processing each record on its own context and I was processing 10 - 20 items at the same time. I actually tried to repeat this job using a new context per item and using async calls with normal processing, but it still wasn't nearly as fast as using the web service method.

Pete
December 23, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

We use 2 separate DbContexts - one for reads (which has tracking off) and one for writes. I find this works pretty well!

Rick Strahl
December 23, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

@Albin - yes very good point. My business object does .Dispose() the Context but only if you dispose the bus object which I didn't do in the sample. But you are absolutely correct - the BO instantiation should be wrapped in a using{} block.

Note the production code is quite different from the code here anyway - we don't access the Context in this application code; all that's wrapped inside of the BO. It's broken out for simplicity and single level control flow here.

As to connections... it's my understanding that EF opens a connection as it's needed for each SQL operation - I don't think it keeps connections open UNLESS you explicitly pass in a connection of your own.

Here's some info that points at this from MSDN:
http://msdn.microsoft.com/en-us/library/vstudio/bb896325%28v=vs.100%29.aspx

If you look in the SQL profiler you see all the connection resets and they are on a per operation not per context basis.

Steve
December 23, 2014

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

I don't think it's EF, really. Same issues with rdo, ado, ado.net -- putting queries into a loop and hitting the DB thousands of times rather than just once, and then iterating / processing from that, normally will not scale.

Tim
January 12, 2015

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Sorry to nit-pick, but you should change FirstOrDefault to First, unless you check after FirstOrDefault that the object is not null...

Tim S.
May 06, 2015

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Now that you have the context split up at DAL or BL level, you might be able to use
Parallel.ForEach instead of ForEach and really watch the cpu's light up.

Ryan
January 30, 2016

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Thanks. This was really helpful.
Even though I was calling SaveChanges() inside my loop, it was still incredibly slow.

Vijay
May 05, 2016

# re: Gotcha: Entity Framework gets slow in long Iteration Loops

Thanks for narrating and documenting in a right way
 

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2017