LINQ to SQL, Lazy Loading and Prefetching

October 12, 2009 • from Maui, Hawaii • 11 comments

On this page:

LINQ to SQL by default loads related entities and entity sets by using Lazy Loading. Which means if you retrieve a list of entities in a query that references other entities these other entities are not actually loaded unless you physically access them. Lazy loading is quite useful in some scenarios – typically when you load up individual instances for display purposes in simple record based UIs, but it can be a real problem when displaying list based data that needs to have access to related data. In related list scenarios it’s quite easy to have a list that has 100 records displayed on a page to fire off 100 hundred child queries to retrieve the related data – clearly in most cases that’s quite undesirable.

Lets look at an example, from my Time Trakker sample application for the LINQ to SQL business layer.

In my Time Trakker app I have time entries that can be queried. An entry has a related project and customer which in turn can be referenced through the project:

TimeTrakkerModel

So if I have an EntryEntity I can get a project with Entry.ProjectEntity and a customer with Entry.ProjectEntity.CustomerEntity. So far so good.

Now in my businesslayer I have a routine that basically allows me to query this model for a timesheet report by passing in a bunch of report parameters via a parameter object like so:

public IQueryable<EntryEntity> GetTimeSheetByClient(TimesheetReportParameters parms)
{                        
    IQueryable<EntryEntity> result =
        from entry in Context.EntryEntities
        where entry.TimeIn >= parms.FromDate && 
              entry.TimeIn < parms.ToDate.Date.AddDays(1) &&
              entry.PunchedOut &&
              parms.Companies.Contains(entry.CustomerPk)
        orderby entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName, entry.TimeIn 
        select entry;

    if (parms.BillType == "Unbilled")
        result = result.Where(e => !e.Billed);
    else if (parms.BillType == "Billed")
        result = result.Where(e => e.Billed);
    
    return result;
}

The parameter object has things like a date range, which type of entries to retrieve and so on which are applied against the mode in the query consecutively. This incremental building up of queries using Linq’s fluent language interface is one of my favorite LINQ features. The result of this query is then returned as an IQueryable<EntryEntity>.

When I retrieve this list of entries in the ASP.NET front end I end up databinding it and then binding to the data inside of a ListView control. The data used is mostly from the Entry Entity but there’s also some data displayed from the customer and project. Here’s what the html based report looks like:

Repport

Notice the headers which include the customer name and project name which come from related entities.

Now when the report runs it loads the report like this:

protected void ProcessReport()
  {            
      this.ReportParameters.Companies.Clear();

      foreach (ListItem item in this.lstCustomers.Items)
      {
          if (item.Selected)
              this.ReportParameters.Companies.Add(int.Parse(item.Value));
      }

      this.DataBinder.Unbind();

      if (this.DataBinder.BindingErrors.Count  > 0)
      {
          this.ErrorDisplay.ShowError(this.DataBinder.BindingErrors.ToHtml(), "Please correct the following");
          return;
      }

      IQueryable<EntryEntity> entries = this.Entry.GetTimeSheetByClient(this.ReportParameters);

      // We'll dynamically load the ReportView User Control View, bind it then render
      TimeSheetReport rep = this.LoadControl("~/Reports/TimeSheetReport.ascx") as TimeSheetReport;
      rep.BindData(entries, this.ReportParameters);

      this.RenderReport(rep);
}

then the actual binding in the report to the listview the TimeSheetReport control:

public void BindData(IQueryable query, object parameters)
{
    this.Parameters = parameters as TimesheetReportParameters;

    this.entryList= (query as IQueryable<EntryEntity>).ToList();

    this.lstReport.DataSource = this.entryList; 
    this.lstReport.DataBind();            
}

And finally the data is actually bound:

<asp:ListView runat="server" id="lstReport" ItemPlaceholderID="layoutContainer"  >
<LayoutTemplate>    
    <div id="layoutContainer" runat="server" />
</LayoutTemplate>
<ItemTemplate>    
    
    <%# this.RenderProjectGroupHeader(Container.DataItem as EntryEntity) %>
    <div class="itemcontainer">    
    <div class="itemheader"><%# Eval("Title") %>   </div>    
    <table width="90%" cellpadding="5">
    <tr>
    <td valign="top" style="width:120px;"><small><%# TimeUtils.ShortDateString((DateTime) Eval("TimeIn"),true) %><br />
    <%# TimeUtils.ShortDateString((DateTime) Eval("TimeOut"),true) %></small></td>
    <td valign="top"><%# Eval("Description") %></td>
    <td valign="top" style="width: 70px;"><%# TimeUtils.FractionalHoursToString( (decimal) Eval("TotalHours"), "{0}h {1}min" ) %></td>
    </tr>
    </table>        
    </div>
    <%# this.RenderProjectFooter(Container) %>    
</ItemTemplate>
</asp:ListView>

The main data displayed for each entry all comes from the Entry entity and that’s fine. However there RenderProjectGroupHeader() method figures out whether the group changed and based on this needs to access the Customer and Project information.

        protected string RenderProjectGroupHeader(EntryEntity  entry)
        {
            string output = "";

            // if the project name has changed render a group as a <div> header
            if (entry.ProjectEntity.ProjectName != lastProjectName)
            {
                lastProjectName = entry.ProjectEntity.ProjectName;
                
                // *** New project add hours for first time
                this.groupTotalHours = entry.TotalHours;

                string html = string.Format(
@"<div class='groupheader'>
    <div style='float:right'><small>{0}</small></div>
    {1}
</div>
", entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName);

                return html;
            }
            else
                this.groupTotalHours += entry.TotalHours;
            
            this.grandtotalHours += entry.TotalHours;

            return output;
        }

And here you can see the child entities being accessed.

Now the report works fine with this code. But if you actually look at this SQL generated for this report you’ll find that this query causes a flurry of activity against the server:

Profiler

It turns out that each access to the ProjectEntity and CustomerEntity object causes a separate query to the server (shown) for the first time a particular Project or Entity is referenced. This means for each project in the application there will be two additional queries hitting the server so if I show 20 projects I’ll hit the database 40 extra times. That’s lazy loading for you and it’s usually a problem in list based displays like this. It can be much worse if EVERY row you display requires one or more related entities.

Working around Lazy Loading in LINQ to SQL

LINQ to SQL loads related entities and entity sets using lazy loading. Oddly you can’t override this behavior either in the model – even though you can delay load individual properties (like large text, image or XML fields). So everything always lazy loads.

DataLoadOptions for Prefetching

There’s a mechanism that allows you to specify prefetching using the DataLoadOptions which looks like this when applied GetTimeSheetByClient():

public IQueryable<EntryEntity> GetTimeSheetByClient(TimesheetReportParameters parms)
{            
    DataLoadOptions options = new DataLoadOptions();
    options.LoadWith<EntryEntity>(c => c.ProjectEntity);
    options.LoadWith<ProjectEntity>(p => p.Customer);
    this.Context.LoadOptions = options;

    IQueryable<EntryEntity> result =
        from entry in Context.EntryEntities
        where entry.TimeIn >= parms.FromDate && 
              entry.TimeIn < parms.ToDate.Date.AddDays(1) &&
              entry.PunchedOut &&
              parms.Companies.Contains(entry.CustomerPk)
        orderby entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName, entry.TimeIn 
        select entry;

    if (parms.BillType == "Unbilled")
        result = result.Where(e => !e.Billed);
    else if (parms.BillType == "Billed")
        result = result.Where(e => e.Billed);
    
    return result;
}

This works and results in one large SQL statement that preloads the customer and project child entities.

However, this approach is problematic in a number of ways if you’re like me using a persistent data context. In my applications I scope a data context to a business object, so multiple operations may be performed against a single data context instance. If you have more than one operation that’s called and setting the DataLoad options it’s easy to clobber the settings on this object. A common problem when you have a persistent object like the DataContext and with L2S it’s difficult to get away from storing the context in some sort of persistent mechanism as it’s a connected ORM.

The fact that the options are tied to the DataContext rather than the query itself also makes it much more difficult to customize the query externally depending on the usage of the result. I often pass out an iQueryable and allow customization of the result set. Outside of the business layer the DataContext and therefore the options are not available and so it can’t be set externally.

Finally the DataOptions are tied to full related entities. If you just want to preload a property or two (like ProjectName and Company in our example here) you’re out of luck. DataOptions only deal with full entities.

Using Projection to Force Pre-Fetching

But there’s another way to write your queries differently to effectively force LINQ to SQL to prefetch by using projection in the Select clause of the LINQ statement. Basically the idea is to project the related entities or simple properties at the top level of the returned result to force the data to be denormalized and so becomes effectively preloaded.

So back to the example, if I want to load both the Project and Customer instances completely I can force the query to project into a type like this:

public class ReportEntryItem
{
    public EntryEntity Entry {get; set;}
    public CustomerEntity Customer {get;set;}
    public ProjectEntity Project { get; set; }
}

I’m using an explicit type declaration here rather than an anonymous type because the helper methods of the report rely on an entity instance to retrieve customer and project information and an Anonymous object would not allow access of properties beyond the declared query scope. The explicit object gives me a strongly typed object I can pass around the report UI methods without resorting to Reflection. If you don’t pass the returned result around and bind it immediately the explicit type is not required which is one less thing to declare.

With the new type in place, BindData then changes the query code to look like this:

List<EntryEntity> entryList;
public voidBindData(IQueryable query, objectparameters)
{
    this.Parameters = parameters asTimesheetReportParameters;

    var q = query asIQueryable<EntryEntity>;

    // denormalize the list so we minimize related queries
   var res = from entry in q
              select newReportEntryItem
              {
                  Entry = entry,
                  Project = entry.ProjectEntity,
                  Customer = entry.ProjectEntity.Customer
              };

    // in this case we want a List<T> so we can lazy load more data
   this.entryList = res.ToList();

    this.lstReport.DataSource = this.entryList;
}

Now when the ToList() executes I get a single denormalized and very large record for each row that includes the data for Entry, Customer and Project in the generated SQL:

exec sp_executesql N'SELECT [t0].[Pk], [t0].[CustomerPk], [t0].[ProjectPk], [t0].[InvoicePk], 
[t0].[UserPk], [t0].[Title], [t0].[Description], [t0].[TimeIn], [t0].[TimeOut], [t0].[PunchedOut], 
[t0].[Qty], [t0].[Rate], [t0].[TotalHours], [t0].[ItemTotal], [t0].[Taxable], [t0].[Billed], 
[t0].[Imported], [t0].[Xml], [t0].[tversion], [t3].[test], [t3].[Pk] AS [Pk2], [t3].[LastName], 
[t3].[FirstName], [t3].[Company], [t3].[Address], [t3].[City], [t3].[State], [t3].[Zip], 
[t3].[Country], [t3].[CountryId], [t3].[Phone], [t3].[Email], [t3].[Fax], [t3].[Notes], 
[t3].[Entered], [t3].[Updated], [t3].[LastOrder], [t3].[BillingRate], [t3].[Xml] AS [Xml2], 
[t3].[tversion] AS [tversion2], [t1].[Pk] AS [Pk3], [t1].[CustomerPk] AS [CustomerPk2], 
[t1].[ProjectName], [t1].[Entered] AS [Entered2], [t1].[StartDate], [t1].[EndDate], 
[t1].[Status], [t1].[tversion] AS [tversion3]
FROM [dbo].[Entries] AS [t0]
INNER JOIN [dbo].[Projects] AS [t1] ON [t1].[Pk] = [t0].[ProjectPk]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t2].[Pk], [t2].[LastName], [t2].[FirstName], [t2].[Company], [t2].[Address], [t2].[City], [t2].[State], [t2].[Zip], [t2].[Country], [t2].[CountryId], [t2].[Phone], [t2].[Email], [t2].[Fax], [t2].[Notes], [t2].[Entered], [t2].[Updated], [t2].[LastOrder], [t2].[BillingRate], [t2].[Xml], [t2].[tversion]
    FROM [dbo].[Customers] AS [t2]
    ) AS [t3] ON [t3].[Pk] = [t1].[CustomerPk]
WHERE (NOT ([t0].[Billed] = 1)) AND ([t0].[TimeIn] >= @p0) AND ([t0].[TimeIn] < @p1) AND ([t0].[PunchedOut] = 1) AND ([t0].[CustomerPk] IN (@p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18))
ORDER BY [t3].[Company], [t1].[ProjectName], [t0].[TimeIn]',N'@p0 datetime,@p1 datetime,@p2 int,@p3 int,@p4 int,@p5 int,@p6 int,@p7 int,@p8 int,@p9 int,@p10 int,@p11 int,@p12 int,@p13 int,@p14 int,@p15 int,@p16 int,@p17 int,@p18 int',@p0='2009-01-09 00:00:00',@p1='2009-10-10 00:00:00',@p2=25,@p3=2,@p4=40,@p5=41,@p6=42,@p7=15,@p8=26,@p9=45,@p10=44,@p11=16,@p12=3,@p13=24,@p14=23,@p15=4,@p16=43,@p17=39,@p18=1

Denormalized alright, but what this does is effectively provides the necessary data for all Entry, Project and Customer entities to be completely filled.

For the ListViewItem template this means that individual controls now bind to Entity.Field:

<div class="itemheader"><%# Eval("Entity.Title") %></div>

instead of just binding to the Title before. I can also directly access the project now without reloading:

incidentally the following also works now without another trip to the database:

To demonstrate in the example, the report header function now receives the report item as a parameter and gets a repItem.Entry from it. Otherwise there are no changes in that code from before:

protected string RenderProjectGroupHeader(ReportEntryItem  repItem)
{
    string output = "";
    EntryEntity entry = repItem.Entry;

    // if the project name has changed render a group as a <div> header
    if (entry.ProjectEntity.ProjectName != lastProjectName)
    {
        lastProjectName = entry.ProjectEntity.ProjectName;
        
        // *** New project add hours for first time
        this.groupTotalHours = entry.TotalHours;

        string html = string.Format(
@"<div class='groupheader'>
<div style='float:right'><small>{0}</small></div>
{1}
</div>
", entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName);

        return html;
    }
    else
        this.groupTotalHours += entry.TotalHours;
    
    this.grandtotalHours += entry.TotalHours;

    return output;
}

Note that I can still bind to entry.ProjectEntity.ProjectName and this works without another trip to the database because the Project entity was preloaded in the repItem.Project property. This the same instance that’s referenced in entry.ProjectEntity just as repItem.CustomerEntity is the same entry.ProjectEntity.CustomerEntity.

By including the Customer and Project entities as projected values in the query we’ve forced them to preload and we've reduced 40 queries to just a single (but large and redundant) query.

Watch what you Load

This example loads the full customer and project entities, but realistically we only need the Company and ProjectName fields from the respective related tables. If you know up front you only need a few of the fields in a table it’s more efficient to load just that data in the projection which has roughly the same effect. Again this is easier if you can use an anonymous type, otherwise the projected type instance has to reflect this layout.

The following is quite a bit more efficient than retrieving the full entities:

var res = from entry in q
 select new ReportEntryItem
 {
     Entry = entry,
     ProjectName = entry.ProjectEntity.ProjectName,
     Company = entry.ProjectEntity.CustomerEntity.Company
 };

because it reduces the amount of field data for each row of SQL result data significantly.

You have to be careful with this though – in the above scenario if I access entry.Entry.ProjectEntity.CustomerEntity.Address I will end up lazy loading both the Project and Customer records from the database. In this instance only the ProjectName and Company were retrieved, not the entire entities. IOW, if you pull just a few fields of related entities make sure you stick to using only those fields and leave the actual related entities alone.

To Lazy Load or Not to Lazy Load

Arguably LINQ to SQL’s lazy loading functionality behavior without override ability is not exactly intuitive, but it’s not terribly complex to force it to return pre-loaded data either. It does require some foresight as switching between lazy loaded and pre-loaded entities effectively requires some code changes to reflect the project type’s structure. It would be much nicer if there was syntax that would let you specify a prefetch path right on the query (something I really like about llblGen’s LINQ implementation).

Also keep in mind that lazy loading isn’t always bad so this isn’t something that should be done on every query. Denormalized data retrieves a lot of repetitive data so you’re often sacrificing data on the wire for fewer round trips to the database. In many situations – especially record level CRUD operations – lazy loading is usually a great way to go. Even small list based displays might not suffer severely from many requests to retrieve data vs. running a single query to retrieve the data you need. Also keep in mind that child items don’t necessarily load for EACH and every row, but only the first time a particular instance is loaded, whereas a denormalized query ALWAYS retrieves the related data for every row. So a list with a lot of repeated related data may not actually access the database as often as you might thing. It’s a tradeoff and experimentation is required to see what works best in each situation. But with a little bit of extra work you can make L2S do what you need it to.

The Voices of Reason

Jason Haley
October 12, 2009

# Interesting Finds: October 12, 2009

Interesting Finds: October 12, 2009

Johan Nilsson
October 12, 2009

# re: LINQ to SQL, Lazy Loading and Prefetching

Great article, very descriptive!

I was wondering, would a join also invoke loading of a whole object?
Let's see if I can give an example.

We have the tables companies and locations. Connected via a FK.

would the query:
return from location in locations join company on fk
where company.id == x;

load the same amount of data as:
return company.location

?

Might be a bad example, hopefully you'll understand. Is there a difference between join and accessing via a dot '.'

Cheers

Masoud
October 12, 2009

# re: LINQ to SQL, Lazy Loading and Prefetching

hi, nice article !

very usefull.

but can you explane more about Lazy Loding ?

Rick Strahl
October 12, 2009

# re: LINQ to SQL, Lazy Loading and Prefetching

@Masoud - Lazy loading is described at the beginning of the article. It's delayed loading of related data until you access the related entity or entity which causes a separate query to the data base at the time of the access.

ClipperHouse
October 12, 2009

# Solving the n 1 problem in Linq to SQL, using projection

Solving the n 1 problem in Linq to SQL, using projection

Matt Slay
October 22, 2009

# re: LINQ to SQL, Lazy Loading and Prefetching

Very good and insightful read. I can see cases where this will help in my app creation.

Thanks, Rick, for all the content you generate on these real world topics.

Dave Forst
November 11, 2009

# re: LINQ to SQL, Lazy Loading and Prefetching

Hi Rick,

I, like you, do not want to use DataLoadOptions because my business objects use a persistent data context. Projection seems to work great onto flat objects, but suppose you wanted to project into an object like this:

public class ReportEntryItem
{
    public EntryEntity Entry { get; set; }
    public List<CustomerEntity> Customers { get; set; }
    public List<ProjectEntity> Projects { get; set; }
}

When I use this kind of projection and profile against the database, LINQ to SQL still seems to be running multiple queries to generate the results.

Any thoughts?

Jeff Benson
March 02, 2010

# re: LINQ to SQL, Lazy Loading and Prefetching

Hi, Rick,

This article was very helpful - thank you!

You mention a Time Trakker sample application. When I tried to Google that, though, I found a page on your site with FoxPro links from 1996. Is there a way to obtain the LINQ to SQL version of your Time Trakker sample?

Thanks,

Jeff

2012 Planet alignment and Astronomy
April 25, 2010

# 2012 Planet alignment and Astronomy

This is a video containing some shocking information. At the end is a heavy religious message. We neither agree with the religious message nor reject it. We should try to distinguish between the information contained in it and the religious message, which may not be suitable for everyone. But we all need to watch this video and understand it. [ Eds.] Friends of Liberty: FEMA CONCENTRATION CAMPS: LOCATIONS AND EXECUTIVE ORDERS. dailypaul. com: The March 13th secret closed door meeting of the U. S. House of...

Photography West Palm Beach
May 23, 2010

# Photography West Palm Beach

Going to Phuket in December. Have yet to apply for leave, which is dangerous.

Mark Berry
February 02, 2011

# re: LINQ to SQL, Lazy Loading and Prefetching

Very helpful read, thanks. I'm new to LINQ, coming from a database-centric background where much of the business logic was in the DB. In that world, I would have created a View for the report rather than trying to control the joins etc. in the C# code. Is there any reason not to do that and then use LINQ to access that View?

I get especially nervous when I see the auto-generated code uses a LEFT OUTER JOIN SELECT 1 on Customers: is it really possible to have a Project without a Customer? And why only SELECT 1 Customer, as if there could be multiple Customers per Project? Maybe you don't have foreign keys defined, and it would generate different T-SQL if you did?

Rick Strahl's Weblog