Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
West Wind WebSurge - Rest Client and Http Load Testing for Windows

Entity Framework and slow bulk INSERTs


:P
On this page:

I’ve been working on an internal project where I needed to load up a large amount of records into a database for testing. On this project I’m using Entity Framework Code First 6.02 and since I’m in initial testing mode I decided to create a DbInitializer that adds about 30,000 records to the database. I figured this was a reasonable idea for the moment as I’m testing out a few different scenarios for handling a large amount of data.

I ran into a major problem with the code I initially used however, which is simply using Context.Table.Add() to add these records to two tables in my DbContext:

private void SeedInitialData(WebMonitorContext context) { for (int i = 0; i < 10000; i++) { var user = new User() { Email = DataUtils.GenerateUniqueId() + "@.test.com", FirstName = DataUtils.GenerateUniqueId(), LastName = DataUtils.GenerateUniqueId(15), Company = DataUtils.GenerateUniqueId(12), StreetAddress = DataUtils.GenerateUniqueId(50), City = DataUtils.GenerateUniqueId(), State = "CA" }; user.Password = AppUtils.EncodePassword("password", user.Id); context.Users.Add(user); var site = new WebMonitorSite { UserId = user.Id, SiteName = "West Wind Technologies", Url = "http://west-wind.com/", SearchFor = "West Wind Technologies" }; context.Sites.Add(site); site = new WebMonitorSite { UserId = user.Id, SiteName = "GeoCrumbs", Url = "http://geocrumbs.net/", SearchFor = "GeoCrumbs" }; context.Sites.Add(site);

context.SaveChanges(); }

// context.SaveChanges(); // also tried this even slower }

So then, in order to get the database created, I ran a test that simply returned a few records from the users table and waited… and waited and waited…

At first I thought something was wrong, but I checked the DB and saw the database created, and records getting added, very, very slowly. At some point later I let this run out completely and found that it took 57 minutes to complete adding the 30,000 records. Yikes…

Now granted in a production application I wouldn’t recommend to do this kind of INSERT batch in a DbInitializer, but the problem that occurred here is something that can crop up in other places as well as you’ll see in a second.

Replacing with ADO.NET Code

Just to make sure it wasn’t issue with my local Sql Server instance I re-ran the same operations using my own DbDataAccess::SaveEntity() helper (part of the West Wind Utilities library in the West Wind Toolkit) which is a generic based Sql mapper that maps properties to database field in a flat table using Reflection and ADO.NET – IOW, it’s not super fast but easy to use on simple one-dimensional objects like this.

So instead of the Context.Table.Add() and context.SaveChanges() I simply replaced the Add() calls with the SaveEntity method:

context.Db.SaveEntity(user, "Users", "id");
and
context.Db.SaveEntity(site, "WebMonitorSites", "id");

Running this takes a mere 25 *seconds* for those same 30,000.

Context Bloat and Large Lists

My first thought was, this can’t be right and as it turns out there are a few ways to make EF work a lot faster in this scenario.

The issue here is that I’m creating a lot of records and I keep adding them to the same context instance. This means that by the time I’m done this DbContext will end up tracking 30,000 records. So the first few inserts probably run pretty fast, but as the list of records grows it gets slower and slower as EF tries to track and ever larger object graph. I’m not exactly sure why this should happen in this case as the data being added are very simple records into a two table schema with no relations, but clearly EF does more than just property change tracking as items are added to the context.

There are two easy fixes to this problem:

  • Keep the active Context Graph small by using a new context for each Unit of Work
  • Turn off AutoDetechChangesEnabled

Use a Unit of Work and a new Context for each set of Records

This approach basically never lets the Context get very large by recreating the context inside of the loop. So this would look like this:

for (int i = 0; i < 10000; i++) { var context = new WebMonitorContext();

… add items to the context tables

context.SaveChanges();

}

Making this little change (in a test, not the Iinitializer!) the performance now comes down to something reasonable: The same 30,000 inserts take 33 seconds.

This is a good practice in general. Building up large trees is a terrible idea inside of an application as you are making it more likely to cause lock conflicts or otherwise letting the data get stale. In most situations unit of work saves are more efficient.

This is one way to solve the Context graph size problem, but unfortunately this doesn’t work for my DbInitializer, because I don’t control the instantiation of the context when the initializer is called – EF passes the context in so I can’t create a new one.

Turn off AutoDetectChangesEnabled

Another solution suggested by @julielerman is to turn off AutoDetectChangesEnabled. This confirms that the problem in and of itself is the change tracking in EF on large datasets. Turning off this flag on the context also brings the time down to about 20 seconds and – more importantly - it also works for my DbInitializer since I can simply apply the flag on the existing passed in context:

context.Configuration.AutoDetectChangesEnabled = false; for (int i = 0; i < 10000; i++)
{

… do the Context.Add() / SaveChanges() here as before

}

This is the easiest solution in this case and it actually turns out to also be quite a bit faster.

AutoDetectChangesEnabled should be used very selectively if at all. There are other mechanisms such as AsNoTracking() that can be a better solution to avoid change tracking of lists. But in this case it solves a specific problem. Just make sure that when you’re done with your non-change tracking required operations that you turn the flag on again, or probably even better that you kill and recreate your context altogether (Unit of Work ideally but even in the current UOW scope if necessary).

Batching

Several people on twitter also suggested that instead of calling SaveChanges() on each unit of work, it’d be more efficient to batch a number of commands together which I also added on the bottom of the loop:

context.Configuration.AutoDetectChangesEnabled = false;

for (int i = 0; i < 10000; i++)
{
… do the Context.Add() / SaveChanges() here as before

if (i % 50 == 0) context.SaveChanges(); }

context.SaveChanges() // save whatever is left over

And this knocks off another 4 seconds – down to 16 seconds which is good enough for me.

Another few people suggested that for large data imports, SQL Bulk Insert is probably a better choice, but you have to use tools other than Entity Framework to handle implementing this functionality. Personally I think that’s overkill unless the ultimate performance is required AND there’s actually a convenient way to get the data into the format that SQL Bulk Insert can import from such as CSV files. Creating those files in the first place can be slow for example and might negate any perf benefits you might get from the bulk insert in the first place.

DbTable.AddRange()

Another option for adding records is by using DbTable.AddRange() to build up the list of records in memory first and then submit the entire list at once in a single command. In this case the change detection doesn’t start until after all the records have been added.

To do this I can change the code to dump all the records to a List<T> first, then add the lists at the end:

[TestMethod]
public void InsertEfAddRangeTest()
{      
    var context = new WebMonitorContext();

    var users = new List<User>();
    var sites = new List<WebMonitorSite>();

    for (int i = 0; i < 10000; i++)
    {        
        …
users.Add(user); …
sites.Add(site); } context.Users.AddRange(users); context.Sites.AddRange(sites);
context.SaveChanges(); }

Note that here I’m not turning off AutoDetectChangesEnabled and the performance is in the 12 range which is provides almost identical performance to when AutoDetectChangesEnabled is off.

Realistically this only works if the set doesn’t get too large since this requires that the whole set of data is loaded into memory first. If you need to do multiple batches (combined with code similar to the Batching section) you also still would need to disable AutoDetectChangesEnabled=false.

Entity Framework

I’ve been slow to come around to accepting Entity Framework, since my early experiences with Version 1.0 really left a bad taste in my mouth. But since Version 4.0 and Code First, I’ve been using EF more and more. It’s come a long way and performance and stability at least close to raw ADO.NET perf in most cases. And for those few things that don’t work or are too slow I tend to fall back to ADO.NET as necessary using my data tools I’ve grafted onto the DbContext for easy access. This seems to be working well for me these days and I’ve had good results on the last 3 projects I’ve worked on as well as a few in-house projects.

But Entity Framework has its quirks as any ORM tool does, and there are lots of things that are found and learned the hard way by doing it wrong first, then finding a solution that doesn’t quite feels satisfying, because the original use case should have worked. This is one of them, but it’s minor. Hopefully this post will save somebody some grief when they search for this particular problem.

Posted in Entity Framework  

The Voices of Reason


 

Anon
December 22, 2013

# re: Entity Framework and slow bulk INSERTs

You can also try EntityFramework.Extended for Batch Update and Delete:
https://github.com/loresoft/EntityFramework.Extended/wiki/Batch-Update-and-Delete

Fujiy
December 22, 2013

# re: Entity Framework and slow bulk INSERTs

You can use new EF 6 AddRange method

Sean Feldman
December 22, 2013

# re: Entity Framework and slow bulk INSERTs

What I'm failing to understand is why to use an ORM for bulk inserts.
Don't get me wrong, this an approach, just looks like an overkill :)

Diego Vega
December 22, 2013

# re: Entity Framework and slow bulk INSERTs

You may want to look at DbSet<T>.AddRange, which suspends and reactivates change detection and automatically.

Blackadder
December 22, 2013

# re: Entity Framework and slow bulk INSERTs

I used ADO.Net SqlBulkCopy - Much much faster

Bond, J. Bond
December 22, 2013

# re: Entity Framework and slow bulk INSERTs

Hi Rick,
I think there might be something wrong with EF6. I had an application using EF4 (reading an XML file, performing some look-ups in memory for every record, then adding them to the context; I was saving the context every 1000 records). It took roughly one minute to run it (around 13000 records).
Then I wanted to see if EF6 doesn't bring some improvements. I liked at least the grouping on schemas in the selection dialog (I was using database first EF). But when it came to running it, hmmm... I think it took something like 15-20 mins. I didn't consider the tracking though, maybe I'll give it another run to see the difference. The solution with creating a new context for every record might work, but I don't think it's a best practice:)
Needless to say, went back to EF4:(

Jon S
December 23, 2013

# re: Entity Framework and slow bulk INSERTs

Regarding Bulk Insert, you don't have to use CSV files. System.Data.SqlClient.SqlBulkCopy works just nicely over the network, but you might want to do something along the lines of https://bitbucket.org/bitpusher/soddi/src/62c734b59eb859a31d3ff7bcb90d27a00a27387c/trunk/soddi/Salient.Data/EnumerableDataReader.cs to convert an IEnumerable<T> to an IDataReader for convenience.

Giorgi
December 23, 2013

# re: Entity Framework and slow bulk INSERTs

EF6 introduced AddRange method which is optimized for adding many items. Have you tried it?
http://blogs.msdn.com/b/adonet/archive/2013/10/17/ef6-rtm-available.aspx

Rick Strahl
December 23, 2013

# re: Entity Framework and slow bulk INSERTs

Wow, this post seems to have stirred up a bunch of feedback. So let me address a few things that has come up in comments.

@Sean Re: Why using EF for batch inserts?
I'm not doing this simply to import the data but this is an actual project with data that I need to work with. So I'm already using EF for this project, so why wouldn't I want to use EF to get the data into the system in the first place? What is overkill to me would be to use another data strategy just to import the data. Which do you think is more consistent and more maintainable?

Re: SqlBulkCopy
Lots of people have mentioned SqlBulkCopy and while I think that's a good idea for truly large sets of data, for something under a few hundred thousand records - especially on a one time import - doesn't and shouldn't need that. Raw ADO.NET or even EF should work just fine for that even if you might have to wait for a minute. Time requirements are relative. Sure SqlBulkCopy would be faster but if I have a one time import that takes 30 seconds - so what? It's totally acceptable and reasonable. Now if it was 57 *minutes* not so much, but that's what this article is about avoiding :-) (seems a few people who commented never read past that 57 minute statement and went straight to comments).

Re: Batching Inserts.
What's also interesting is that with 30,000 records once the change tracking issues are out of the way, the performance is pretty similar regardless of whether you operate individual SQL statements, small batches, one big batch or use AddRange. All of these are within a few seconds of each other.

@Diego - thanks for the pointer on AddRange(). Added a small section to the bottom of the post as another solution. It works and is on par with AutoDetectChangesEnabled=false.

Allan Chadwick
January 14, 2014

# re: Entity Framework and slow bulk INSERTs

Stored Procedures are still king for sql performance and are even easier to implement with EF than older ADO.

Reference:
http://stackoverflow.com/questions/8103379/entity-framework-4-1-vs-enterprise-data-application-block-maximum-performance

Here is an extension for EF that allows EF to take a SQL User Defined Table as an input parameter to a stored procedure. It looks like a another possible solution.
https://github.com/Fodsuk/EntityFrameworkExtras
Some consideration would be needed for handling failed inserts in bulk.

I haven't used it yet, but am eager to find an excuse to try it out.

Markko Legonkov
January 28, 2014

# re: Entity Framework and slow bulk INSERTs

You can check out my extension for using bulkinsert with entity framework https://efbulkinsert.codeplex.com/. It is using datatable in the background and is more than 10 times faster than inserting 10k+ rows.

codefire
November 04, 2014

# re: Entity Framework and slow bulk INSERTs

that looks ugly!

Here are my results for a test on a Master-Details context,
using EF6 with SQLite.

for (int i = 0; i < 10000; i++)
{
Node measurementNode = new Node();
measurementNode.NodeKey = 2;
measurementNode.NodeName = "IntUS(ST-Ht)";
measurementNode.NodeOrder = 0;

CategoryDetail categoryDetail = new CategoryDetail();
categoryDetail.SoftID = "0x02A";
categoryDetail.CategoryInterval = 0;
categoryDetail.RecordTime = 3;

measurementNode.CategoryDetails.Add(categoryDetail);
measurementNodeList.Add(measurementNode);
}

_start = DateTime.Now;
db.Nodes.AddRange(measurementNodeList);
db.SaveChanges();
_totalSec = DateTime.Now.Subtract(_start).TotalSeconds;
Console.WriteLine("10000 Measurement Nodes + linked Category Details inserted in {0} seconds.", _totalSec);


Here is the output:
10000 Measurement Nodes + linked Category Details inserted in 4,7499525 seconds.

ZZZ Projects
December 28, 2014

# re: Entity Framework and slow bulk INSERTs

The ZZZ Projects Bulk Operations Library is the solution for you.

It allows you to effectuate various operations such as: Delete, Insert, Update and Merge on a substantial amount of data.

var operation = new SqlBulkOperation();

// ... Custom Settings ....
operation.BulkDelete(dt);
operation.BulkInsert(dt);
operation.BulkUpdate(dt);
operation.BulkMerge(dt);

It also offers more advanced features like output value, column formula, auditing, intercepting, logging and more.

Find out more on http://zzzprojects.com/bulk-operations/ about the most advanced and flexible Bulk Operations library.

Jonas
December 01, 2015

# re: Entity Framework and slow bulk INSERTs

Thank you for this informative information. I've used this as a guideline for my own projects.

But I've come to a stall now regarding EF. In this scenario I have a very deep EF graph. I'm calculating some data, and the result has about 2 million rows which then must be persisted to database. With pure EF this takes too much time, about 11 minutes. Even with this post as guideline I'm unable to get below 8 minutes. But with a mix of tips from this post and SqlBulkCopy() I have managed to get it down to 1.5 minutes.

Is EF able to work great with this huge amount of data? Especially if the graph is very deep? I tried to break down the graph into more "flat" pieces, persisting only single tables in the end, but this didn't help much.

How do you work with EF and big data?

John Burbidge
June 10, 2016

# re: Entity Framework and slow bulk INSERTs

How long have you been at this?

I remember looking at PayFlow C# examples on your site eons ago.

Anyway, thanks for yet another informative article.

I was doing a data import that took 40 minutes, changing to AddRange took 42 seconds.

I'm always a bit surprised how Entity Framework is able to deal with complex nested relationships and create the records using foreign keys etc...

Anyway, thanks, I've benefited from your contributions many times!

Chris
April 20, 2017

# re: Entity Framework and slow bulk INSERTs

This article has been very helpful.

I'm importing some external history every week or two and the process generally takes a few minutes. This has gotten that down to 20-30 seconds. Additionally, this functionality is purely administrative so there isn't a user waiting for a mission critical report or action to occur because of this.

If I was importing hundreds of thousands of records I would have to pursue some other solution, most likely due to memory constraints and/or ridiculous wait times.

These little tweaks help solutions that are only importing a few thousand records in scenarios where the time to process can be a few minutes without any repercussions. Obviously in mission critical systems or customer facing web applications where patience is truly a virtue, speed is extremely important as people will simply go someplace else.


Ayush Choudhary
October 27, 2018

# re: Entity Framework and slow bulk INSERTs

Hi Rick,

Can you explain how batching will work here. I am reading the IIS Logs and creating JSON line by line and pushing to API.

Here is my API:

public async Task Post(IISLog iislog) { if (!ModelState.IsValid) { return BadRequest(ModelState); }

        db.IISLogs.Add(iislog);
        await db.SaveChangesAsync();

        return CreatedAtRoute("DefaultApi", new { id = iislog.Id }, iislog);
    }

Chris
April 10, 2020

# re: Entity Framework and slow bulk INSERTs

Rick,

Just wanted to say a big thank you for this. I think context bloat was causing us some significant pain on a bulk insert operation using EF Core 3. I am still waiting for my transaction to commit, but the preliminary results in the log seem to indicate a significant performance increase over single context with AutoDetect enabled. Thank you so much for taking the time to post this it was very helpful.


Chris
April 13, 2020

# re: Entity Framework and slow bulk INSERTs

Rick,

Just wanted to say a big thank you for this. I think context bloat was causing us some significant pain on a bulk insert operation using EF Core 3. Obviously AutoDetectChanges is now is a slightly different place inside DBContext.ChangeTracker, but it still works the same way. I am still waiting for my transaction to commit, but the preliminary results in the log seem to indicate a significant performance increase over single context with AutoDetect enabled. Thank you so much for taking the time to post this it continues to be very helpful.


Zach
July 22, 2020

# re: Entity Framework and slow bulk INSERTs

Helpful list of workarounds, thanks Rick.
The year is 2020. Seven years later, your take on EF still seems to be pretty accurate. It still doesn't rise above the the 80/20 rule.


borisdj
May 08, 2023

# re: Entity Framework and slow bulk INSERTs


West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2024