LINQ to SQL by default loads related entities and entity sets by using Lazy Loading. Which means if you retrieve a list of entities in a query that references other entities these other entities are not actually loaded unless you physically access them. Lazy loading is quite useful in some scenarios – typically when you load up individual instances for display purposes in simple record based UIs, but it can be a real problem when displaying list based data that needs to have access to related data. In related list scenarios it’s quite easy to have a list that has 100 records displayed on a page to fire off 100 hundred child queries to retrieve the related data – clearly in most cases that’s quite undesirable.
Lets look at an example, from my Time Trakker sample application for the LINQ to SQL business layer.
In my Time Trakker app I have time entries that can be queried. An entry has a related project and customer which in turn can be referenced through the project:
So if I have an EntryEntity I can get a project with Entry.ProjectEntity and a customer with Entry.ProjectEntity.CustomerEntity. So far so good.
Now in my businesslayer I have a routine that basically allows me to query this model for a timesheet report by passing in a bunch of report parameters via a parameter object like so:
public IQueryable<EntryEntity> GetTimeSheetByClient(TimesheetReportParameters parms)
{
IQueryable<EntryEntity> result =
from entry in Context.EntryEntities
where entry.TimeIn >= parms.FromDate &&
entry.TimeIn < parms.ToDate.Date.AddDays(1) &&
entry.PunchedOut &&
parms.Companies.Contains(entry.CustomerPk)
orderby entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName, entry.TimeIn
select entry;
if (parms.BillType == "Unbilled")
result = result.Where(e => !e.Billed);
else if (parms.BillType == "Billed")
result = result.Where(e => e.Billed);
return result;
}
The parameter object has things like a date range, which type of entries to retrieve and so on which are applied against the mode in the query consecutively. This incremental building up of queries using Linq’s fluent language interface is one of my favorite LINQ features. The result of this query is then returned as an IQueryable<EntryEntity>.
When I retrieve this list of entries in the ASP.NET front end I end up databinding it and then binding to the data inside of a ListView control. The data used is mostly from the Entry Entity but there’s also some data displayed from the customer and project. Here’s what the html based report looks like:
Notice the headers which include the customer name and project name which come from related entities.
Now when the report runs it loads the report like this:
protected void ProcessReport()
{
this.ReportParameters.Companies.Clear();
foreach (ListItem item in this.lstCustomers.Items)
{
if (item.Selected)
this.ReportParameters.Companies.Add(int.Parse(item.Value));
}
this.DataBinder.Unbind();
if (this.DataBinder.BindingErrors.Count > 0)
{
this.ErrorDisplay.ShowError(this.DataBinder.BindingErrors.ToHtml(), "Please correct the following");
return;
}
IQueryable<EntryEntity> entries = this.Entry.GetTimeSheetByClient(this.ReportParameters);
// We'll dynamically load the ReportView User Control View, bind it then render
TimeSheetReport rep = this.LoadControl("~/Reports/TimeSheetReport.ascx") as TimeSheetReport;
rep.BindData(entries, this.ReportParameters);
this.RenderReport(rep);
}
then the actual binding in the report to the listview the TimeSheetReport control:
public void BindData(IQueryable query, object parameters)
{
this.Parameters = parameters as TimesheetReportParameters;
this.entryList= (query as IQueryable<EntryEntity>).ToList();
this.lstReport.DataSource = this.entryList;
this.lstReport.DataBind();
}
And finally the data is actually bound:
<asp:ListView runat="server" id="lstReport" ItemPlaceholderID="layoutContainer" >
<LayoutTemplate>
<div id="layoutContainer" runat="server" />
</LayoutTemplate>
<ItemTemplate>
<%# this.RenderProjectGroupHeader(Container.DataItem as EntryEntity) %>
<div class="itemcontainer">
<div class="itemheader"><%# Eval("Title") %> </div>
<table width="90%" cellpadding="5">
<tr>
<td valign="top" style="width:120px;"><small><%# TimeUtils.ShortDateString((DateTime) Eval("TimeIn"),true) %><br />
<%# TimeUtils.ShortDateString((DateTime) Eval("TimeOut"),true) %></small></td>
<td valign="top"><%# Eval("Description") %></td>
<td valign="top" style="width: 70px;"><%# TimeUtils.FractionalHoursToString( (decimal) Eval("TotalHours"), "{0}h {1}min" ) %></td>
</tr>
</table>
</div>
<%# this.RenderProjectFooter(Container) %>
</ItemTemplate>
</asp:ListView>
The main data displayed for each entry all comes from the Entry entity and that’s fine. However there RenderProjectGroupHeader() method figures out whether the group changed and based on this needs to access the Customer and Project information.
protected string RenderProjectGroupHeader(EntryEntity entry)
{
string output = "";
// if the project name has changed render a group as a <div> header
if (entry.ProjectEntity.ProjectName != lastProjectName)
{
lastProjectName = entry.ProjectEntity.ProjectName;
// *** New project add hours for first time
this.groupTotalHours = entry.TotalHours;
string html = string.Format(
@"<div class='groupheader'>
<div style='float:right'><small>{0}</small></div>
{1}
</div>
", entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName);
return html;
}
else
this.groupTotalHours += entry.TotalHours;
this.grandtotalHours += entry.TotalHours;
return output;
}
And here you can see the child entities being accessed.
Now the report works fine with this code. But if you actually look at this SQL generated for this report you’ll find that this query causes a flurry of activity against the server:
It turns out that each access to the ProjectEntity and CustomerEntity object causes a separate query to the server (shown) for the first time a particular Project or Entity is referenced. This means for each project in the application there will be two additional queries hitting the server so if I show 20 projects I’ll hit the database 40 extra times. That’s lazy loading for you and it’s usually a problem in list based displays like this. It can be much worse if EVERY row you display requires one or more related entities.
Working around Lazy Loading in LINQ to SQL
LINQ to SQL loads related entities and entity sets using lazy loading. Oddly you can’t override this behavior either in the model – even though you can delay load individual properties (like large text, image or XML fields). So everything always lazy loads.
DataLoadOptions for Prefetching
There’s a mechanism that allows you to specify prefetching using the DataLoadOptions which looks like this when applied GetTimeSheetByClient():
public IQueryable<EntryEntity> GetTimeSheetByClient(TimesheetReportParameters parms)
{
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<EntryEntity>(c => c.ProjectEntity);
options.LoadWith<ProjectEntity>(p => p.Customer);
this.Context.LoadOptions = options;
IQueryable<EntryEntity> result =
from entry in Context.EntryEntities
where entry.TimeIn >= parms.FromDate &&
entry.TimeIn < parms.ToDate.Date.AddDays(1) &&
entry.PunchedOut &&
parms.Companies.Contains(entry.CustomerPk)
orderby entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName, entry.TimeIn
select entry;
if (parms.BillType == "Unbilled")
result = result.Where(e => !e.Billed);
else if (parms.BillType == "Billed")
result = result.Where(e => e.Billed);
return result;
}
This works and results in one large SQL statement that preloads the customer and project child entities.
However, this approach is problematic in a number of ways if you’re like me using a persistent data context. In my applications I scope a data context to a business object, so multiple operations may be performed against a single data context instance. If you have more than one operation that’s called and setting the DataLoad options it’s easy to clobber the settings on this object. A common problem when you have a persistent object like the DataContext and with L2S it’s difficult to get away from storing the context in some sort of persistent mechanism as it’s a connected ORM.
The fact that the options are tied to the DataContext rather than the query itself also makes it much more difficult to customize the query externally depending on the usage of the result. I often pass out an iQueryable and allow customization of the result set. Outside of the business layer the DataContext and therefore the options are not available and so it can’t be set externally.
Finally the DataOptions are tied to full related entities. If you just want to preload a property or two (like ProjectName and Company in our example here) you’re out of luck. DataOptions only deal with full entities.
Using Projection to Force Pre-Fetching
But there’s another way to write your queries differently to effectively force LINQ to SQL to prefetch by using projection in the Select clause of the LINQ statement. Basically the idea is to project the related entities or simple properties at the top level of the returned result to force the data to be denormalized and so becomes effectively preloaded.
So back to the example, if I want to load both the Project and Customer instances completely I can force the query to project into a type like this:
public class ReportEntryItem
{
public EntryEntity Entry {get; set;}
public CustomerEntity Customer {get;set;}
public ProjectEntity Project { get; set; }
}
I’m using an explicit type declaration here rather than an anonymous type because the helper methods of the report rely on an entity instance to retrieve customer and project information and an Anonymous object would not allow access of properties beyond the declared query scope. The explicit object gives me a strongly typed object I can pass around the report UI methods without resorting to Reflection. If you don’t pass the returned result around and bind it immediately the explicit type is not required which is one less thing to declare.
With the new type in place, BindData then changes the query code to look like this:
List<EntryEntity> entryList;
public voidBindData(IQueryable query, objectparameters)
{
this.Parameters = parameters asTimesheetReportParameters;
var q = query asIQueryable<EntryEntity>;
// denormalize the list so we minimize related queries
var res = from entry in q
select newReportEntryItem
{
Entry = entry,
Project = entry.ProjectEntity,
Customer = entry.ProjectEntity.Customer
};
// in this case we want a List<T> so we can lazy load more data
this.entryList = res.ToList();
this.lstReport.DataSource = this.entryList;
}
Now when the ToList() executes I get a single denormalized and very large record for each row that includes the data for Entry, Customer and Project in the generated SQL:
exec sp_executesql N'SELECT [t0].[Pk], [t0].[CustomerPk], [t0].[ProjectPk], [t0].[InvoicePk],
[t0].[UserPk], [t0].[Title], [t0].[Description], [t0].[TimeIn], [t0].[TimeOut], [t0].[PunchedOut],
[t0].[Qty], [t0].[Rate], [t0].[TotalHours], [t0].[ItemTotal], [t0].[Taxable], [t0].[Billed],
[t0].[Imported], [t0].[Xml], [t0].[tversion], [t3].[test], [t3].[Pk] AS [Pk2], [t3].[LastName],
[t3].[FirstName], [t3].[Company], [t3].[Address], [t3].[City], [t3].[State], [t3].[Zip],
[t3].[Country], [t3].[CountryId], [t3].[Phone], [t3].[Email], [t3].[Fax], [t3].[Notes],
[t3].[Entered], [t3].[Updated], [t3].[LastOrder], [t3].[BillingRate], [t3].[Xml] AS [Xml2],
[t3].[tversion] AS [tversion2], [t1].[Pk] AS [Pk3], [t1].[CustomerPk] AS [CustomerPk2],
[t1].[ProjectName], [t1].[Entered] AS [Entered2], [t1].[StartDate], [t1].[EndDate],
[t1].[Status], [t1].[tversion] AS [tversion3]
FROM [dbo].[Entries] AS [t0]
INNER JOIN [dbo].[Projects] AS [t1] ON [t1].[Pk] = [t0].[ProjectPk]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t2].[Pk], [t2].[LastName], [t2].[FirstName], [t2].[Company], [t2].[Address], [t2].[City], [t2].[State], [t2].[Zip], [t2].[Country], [t2].[CountryId], [t2].[Phone], [t2].[Email], [t2].[Fax], [t2].[Notes], [t2].[Entered], [t2].[Updated], [t2].[LastOrder], [t2].[BillingRate], [t2].[Xml], [t2].[tversion]
FROM [dbo].[Customers] AS [t2]
) AS [t3] ON [t3].[Pk] = [t1].[CustomerPk]
WHERE (NOT ([t0].[Billed] = 1)) AND ([t0].[TimeIn] >= @p0) AND ([t0].[TimeIn] < @p1) AND ([t0].[PunchedOut] = 1) AND ([t0].[CustomerPk] IN (@p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18))
ORDER BY [t3].[Company], [t1].[ProjectName], [t0].[TimeIn]',N'@p0 datetime,@p1 datetime,@p2 int,@p3 int,@p4 int,@p5 int,@p6 int,@p7 int,@p8 int,@p9 int,@p10 int,@p11 int,@p12 int,@p13 int,@p14 int,@p15 int,@p16 int,@p17 int,@p18 int',@p0='2009-01-09 00:00:00',@p1='2009-10-10 00:00:00',@p2=25,@p3=2,@p4=40,@p5=41,@p6=42,@p7=15,@p8=26,@p9=45,@p10=44,@p11=16,@p12=3,@p13=24,@p14=23,@p15=4,@p16=43,@p17=39,@p18=1
Denormalized alright, but what this does is effectively provides the necessary data for all Entry, Project and Customer entities to be completely filled.
For the ListViewItem template this means that individual controls now bind to Entity.Field:
<div class="itemheader"><%# Eval("Entity.Title") %></div>
instead of just binding to the Title before. I can also directly access the project now without reloading:
<div class="itemheader"><%# Eval("Entity.Project.ProjectName") %></div>
incidentally the following also works now without another trip to the database:
<div class="itemheader"><%# Eval("Entity.Entry.ProjectEntity.ProjectName") %></div>
To demonstrate in the example, the report header function now receives the report item as a parameter and gets a repItem.Entry from it. Otherwise there are no changes in that code from before:
protected string RenderProjectGroupHeader(ReportEntryItem repItem)
{
string output = "";
EntryEntity entry = repItem.Entry;
// if the project name has changed render a group as a <div> header
if (entry.ProjectEntity.ProjectName != lastProjectName)
{
lastProjectName = entry.ProjectEntity.ProjectName;
// *** New project add hours for first time
this.groupTotalHours = entry.TotalHours;
string html = string.Format(
@"<div class='groupheader'>
<div style='float:right'><small>{0}</small></div>
{1}
</div>
", entry.ProjectEntity.Customer.Company, entry.ProjectEntity.ProjectName);
return html;
}
else
this.groupTotalHours += entry.TotalHours;
this.grandtotalHours += entry.TotalHours;
return output;
}
Note that I can still bind to entry.ProjectEntity.ProjectName and this works without another trip to the database because the Project entity was preloaded in the repItem.Project property. This the same instance that’s referenced in entry.ProjectEntity just as repItem.CustomerEntity is the same entry.ProjectEntity.CustomerEntity.
By including the Customer and Project entities as projected values in the query we’ve forced them to preload and we've reduced 40 queries to just a single (but large and redundant) query.
Watch what you Load
This example loads the full customer and project entities, but realistically we only need the Company and ProjectName fields from the respective related tables. If you know up front you only need a few of the fields in a table it’s more efficient to load just that data in the projection which has roughly the same effect. Again this is easier if you can use an anonymous type, otherwise the projected type instance has to reflect this layout.
The following is quite a bit more efficient than retrieving the full entities:
var res = from entry in q
select new ReportEntryItem
{
Entry = entry,
ProjectName = entry.ProjectEntity.ProjectName,
Company = entry.ProjectEntity.CustomerEntity.Company
};
because it reduces the amount of field data for each row of SQL result data significantly.
You have to be careful with this though – in the above scenario if I access entry.Entry.ProjectEntity.CustomerEntity.Address I will end up lazy loading both the Project and Customer records from the database. In this instance only the ProjectName and Company were retrieved, not the entire entities. IOW, if you pull just a few fields of related entities make sure you stick to using only those fields and leave the actual related entities alone.
To Lazy Load or Not to Lazy Load
Arguably LINQ to SQL’s lazy loading functionality behavior without override ability is not exactly intuitive, but it’s not terribly complex to force it to return pre-loaded data either. It does require some foresight as switching between lazy loaded and pre-loaded entities effectively requires some code changes to reflect the project type’s structure. It would be much nicer if there was syntax that would let you specify a prefetch path right on the query (something I really like about llblGen’s LINQ implementation).
Also keep in mind that lazy loading isn’t always bad so this isn’t something that should be done on every query. Denormalized data retrieves a lot of repetitive data so you’re often sacrificing data on the wire for fewer round trips to the database. In many situations – especially record level CRUD operations – lazy loading is usually a great way to go. Even small list based displays might not suffer severely from many requests to retrieve data vs. running a single query to retrieve the data you need. Also keep in mind that child items don’t necessarily load for EACH and every row, but only the first time a particular instance is loaded, whereas a denormalized query ALWAYS retrieves the related data for every row. So a list with a lot of repeated related data may not actually access the database as often as you might thing. It’s a tradeoff and experimentation is required to see what works best in each situation. But with a little bit of extra work you can make L2S do what you need it to.
Other Posts you might also like