Entity Framework Core 3.1 - Part 2

Entity Framework Core 3.1 - Part 2

Overview

In my previous post, we talked looked at an introduction into entity framework core 3.

Read and Write Data

As I mentioned in the earlier one, I don't really like writing posts that take longer than 10 minutes to read. It is just too much information. So let us continue with the next part. To learn more about entity framework mappings always refer to this amazing course on Pluralsight titled: Entity Framework Core 2: Mappings, by Julie Lerman

DB Interaction with EF Core Data Models

Ever wordered how entities are added and modified and magically updated in the database?

Entities's status in the DbContext are tracked by a variety of EntityEntry objects. This is an internal type that is used by the DbContext. Also called Change Tracking. This is an important concept and must be understood to run complex queries efficiently.

Read more about this in the docs.

Let us say you were adding a new object into the database. For example adding a new Student into the db.

context.Add(student) would tell the dbcontext that this student is a new object and must be added the db.

if you were to explore what was going to happen when you call context.SaveChanges() you would find that the statement to insert a record into the database is actually surrounded by a begin transaction and a commit.

Did you know you can actually hook into .NET Core's logging to view the queries that get executed behind the scenes when interacting a database.

Logging Queries

There are many ways to do this. You could configure logging at the DbContext directly or you could do this at an ASP.NET Core logging.

You have first, create a logger and then letting the DbContext know that it has to use this logger.

 1    public class UniversityContext: DbContext
 2    {
 3        ...
 4        public static readonly ILoggerFactory MyConsoleLoggerFactory
 5          = LoggerFactory.Create(builder => 
 6            {
 7                builder
 8                  .AddFilter(category, level) => 
 9                    category == DbLoggerCategory.Database.Command.Name && 
10                    level == LogLevel.Information)
11                  .AddConsole();
12            });
13        ...
14    }

This is most likely something that you will never do at the DbContext level, but just letting you know so that you could get some debugging information that you want to see if you have just begun working with EF in a console app or something.

In order to achieve this, you will need to reference a couple of Nuget packages:

  • Microsoft.Extensions.Logging
  • Microsoft.Extensions.Logging.Console for console logging.

Alright, so you have configured a LoggerFactory, now what? You haven't really asked the context to use it.

 1    public class UniversityContext: DbContext
 2    {
 3        ...
 4        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
 5        {
 6            optionsBuilder
 7              .UseLoggerFactory(MyConsoleLoggerFactory)
 8              .EnableSensitiveDataLogging()
 9              .UseSqlServer("...");
10        }
11        ...
12    }

Generally the Logger doesn't expose any of the incoming parameter values, so if you want to view the values, you could use the EnableSensitiveDataLogging method. This is not something recommended in production.

Bulk loads in EFCore

So far we have been talking about add a single entity. Ever wondered how EFCore would perform if you were to add several entities together? In order to test this you could, with the help of the logging setup earlier, try out a context.AddRange(student1, student2, student3, student4, student5).

This is something I only found out from Julie Lerman's pluralsight course.

The reason I typed 5 student objects is that EFCore's bulk load feature kicks in only where there are 4 or more entities to be updated at the same time. This arbitrary value was arrived at after some performance tests by EFCore Team apparently.

Now let us see, how the queries look like when running single SQL insert.
When viewing the query generated as a result of a single insert, it would look something like:

1    Executing DbCommand [Parameters:[@p0='' (....
2    SET NOCOUNT ON
3    INSERT INTO [MyTable] (Property1, Property2)
4    VALUES (@p0, p1);
5    SELECT [Id]
6     FROM [MyTable]
7     WHERE @@ROWCOUNT=1 AND [Id] = scope_identity();

Compare that log output to the one we get when running the AddRange I mentioned earlier.

1    SET NOCOUNT ON;
2    DECLARE @INSERTED0 TABLE ([Id] INT, [_Position] [int]);
3    MERGE [MyTable] USING (
4    VALUES (@p0, @p1, 0), (@p2, @p3, 1), (@p4, @p5, 2),...
5    AS ...

EF has created a table variable and prepared to do run a Merge of the table variable data into the actual table. This is how bulk operations are handled.

However, it is important to know that there are two way of adding entities to a context. dbContext.MyFancyEntities.Add(mfe1) or dbContext.Add(mfe1). Similar API exists for AddRange too. This can be beneficial when you want to add multiple entities of different types to the database in one go.

The default size of a batch operation is generally specific to the database provider. In case it goes beyond that limit, it generally divides the operation into batches and sends batches to be executed against the db.

Understanding EFCore Query Execution

I hope you have already used previous version of EntityFramework, so I am going to assume here that you already know that the ToList() method call on an entity or its predicate is what results in the actually query execution in entity framework. EFCore is no different.

What happenes behind the scene during this?

When the ToList() is invoked, EFCore transforms the query into SQL. This is then cached for that application instance to improve speed for later execution, however, I am not entirely sure how long it is kept in cache. Then the SQL is executed on the database and results received in tabular data form and then translates the returned dataset into entities. This process is apparently called materialisation as objects. While this happens, EFCore also adds tracking details for the entities to the DbContext instance.

LINQ Syntax: Two ways to express a query

LINQ Methods
1    dbContext.MyEntities.ToList();
2
3    
4    dbContext.MyEntities
5      .Where(me => me.property == "value")
6      .ToList();
LINQ Query syntax
1    (from me in dbContext.MyEntities
2     select me).ToList()
3
4    
5    (from me in dbContext.MyEntities
6      where me.property=="value"
7      select me).ToList()

Both are valid. The query syntax is slightly more verbose and more like an evolved version of SQL. The method syntax however, is more like a functional programming version, which I am more accustomed to. Both syntaxes are fine and works exactly the same way. Adhere to what your project conventions are.

Gotcha

That's all good. But what if you did something like:

1    foreach(var me in dbContext.MyEntities)
2    {
3        // some functionality here using me
4    }

If you have been paying attention, you already realised that we haven't made a call to ToList(). So you might be thinking, has the query even been executed?. It is really important to understand the difference between this enumeration execution vs the ToList() execution.

In the foreach above, for every enumerated item in the dbContext.MyEntities, a query is executed against the database. Let that sink in for a minute.

So if there are 100s of entities, then the queries are executed that many number of times against the database, the connection is kept open till the end of the foreach block and data is sent and received back and forth between application and database for every iteration!

What have you done?!

Just remember that when you use LINQ.

Filtering and Searching

Filtering syntax, the Where predicate and methods behave the same way from a user perspective, although behind the scenes they have been completely overhauled. There are several methods to execute SQL against the underlying database like the ToList() and most of them have a corresponding Async() version aptly named alongside it. There is also a very important method that you probably already know about but didn't use. Find(key) is a DbSet method that gets executed immediately and returns an element that matches the key, from the cache first, if it exists there, else goes to the database to retrieve it. This is the most efficient way to fetch an entity, if you need it and you know its unique identifier.

When fetching a record and updating it in the same DbContext, Entity Framework knows the state of the record, hence it only passes only the property that changed for that entity to update in the database. However, if you were fetching data in one DbContext and then later updating it in another DbContext instance, EFCore now does not have the information it needs to identify what in the object has actually change. This results in EFCore sending the query to update with all the properties of that record as parameters, so that the values, whichever has changed, can be updated in the database.

Performance and Change Tracking

Performance of an application is an important non-functional requirement. This can be a concern when developing applications using an ORM, after all an ORM, although simplifies application development, doesn't clearly show you what it is doing behind the scenes. This means, you feel like you lack control in how the query behaves. Change tracking is expensive. And hence sometimes, to gain more control, you have to disable this useful feature for performance. You can choose to disable change tracking on a per query basis, or at the context level itself.

To disable tracking at a query level:

1dbContext.MyEntities.AsNoTracking().FirstOrDefault();

To disable tracking at a DbContext level, you may choose to create one that has the following propery set in the constructor. You can read more about this in the docs.

1ChangeTracker.QueryTrackingBehaviour=QueryTrackingBehaviour.NoTracking;

QueryTrackingBehaviour is an Enum.

How would you insert related data?

1dbContext.MyEntities.Add(new MyEntity { 
2    property1 = 1, 
3    myRelatedEntity = new MyRelatedEntity { 
4        property1 = "blah" 
5        } 
6    });

What is Eager Loading?

Loading related data along with a particular type when fetching from the database.

Done using Include(me=> me.RelatedEntities) and sometimes combined with ThenInclude(mer=> mer.RelatedEntitysRelatedEntity)
Read more about Eager Loading in the docs.

Projections are a way to create anonymous types within a select predicate to either cast into a specific type or a dynamic.

What is Explicit Loading?

So you could explicitly retrieve data that is already in Memory. Read more about this too in the docs. The only advantage of using this is when you want to filter the related data in a query. I have never personally used this. And so I wouldn't delve into explaining it myself. I'd rather you read it from the docs.

It does have its limitation: can only load related entities from a single object, not from a collection. So you'd need to fetch myEntity and then use something like this:

1    dbContext.Entry(myEntity)
2            .Collection(me => me.MyRelatedEntities)
3            .Query()
4            .Where(mer => mer.Property == 9)
5            .ToList();

What is Lazy Loading?

This is when the related data is transparently loaded from the database when the navigation property is accessed. This is something that can cause extremely bad performance issues. Hence, it is turned off by default and must be enabled using:

  1. Some additional configuration
  2. An additional nuget package: Microsoft.EntityFrameworkCore.Proxies.
  3. Make all your navigation properties Virtual

You can read all about this in the docs. I really haven't had a project using EF Core that did Lazy Loading. So I am in no position to explain it here. But what I do know is the docs are great to learn from.

In the next post we'll look at how to update related data.

comments powered by Disqus