Imagine that we have an entity handled by Entity Framework which we want to transform into some kind of DTO. To be precise, let’s have the following entity:

public class StubEntity
{
	public ICollection< StubEntity> RecursiveCollection1 { get; set; }
	public ICollection< StubEntity> RecursiveCollection2 { get; set; }
	public StubEntity RecursiveEntity { get; set; }
	public int IntegerValue { get; set; }
}

Matching DTO looks like this:

public class StubEntityDto
{
	public IList< StubEntityDto> RecursiveCollection1 { get; set; }
	public IList< StubEntityDto> RecursiveCollection2 { get; set; }
	public IList< int> IntegerList { get; set; } 
}

I omitted all attributes required for EF to avoid noise in the code.

We also have the following mapping from entity do DTO:

Mapping = x => new StubEntityDto
{
	RecursiveCollection1 =
		x.RecursiveEntity.RecursiveCollection1.SelectMany(a => a.RecursiveCollection1)
			.Select(a => new StubEntityDto
			{
				RecursiveCollection1 = a.RecursiveCollection1.Select(b => new StubEntityDto()).ToList()
			}).Select(a => new StubEntityDto
			{
				RecursiveCollection1 = x.RecursiveCollection1.Select(b => new StubEntityDto()).ToList()
			})
			.ToList(),

	RecursiveCollection2 =
		x.RecursiveEntity.RecursiveCollection2.SelectMany(a => a.RecursiveCollection2)
			.Select(a => new StubEntityDto())
			.ToList(),

	IntegerList = x.RecursiveCollection2.Select(y => y.IntegerValue).ToList()
};

If we take StubEntity and map it to DTO using this lambda, we will quickly run into problem with lazy loading of recursive collections. Since we know that we will need all of the data, we can try to generate necessary includes for EF at start so we extract all collections in one query. Sounds easy, right? Well, let’s begin.

First things first

Since we have mapping as a lambda, we could try to analyze it and extract includes simply from code. This doesn’t sound like a tough thing, however, in fact it can be very difficult to do it right. Consider these two cases:

Mapping = x => new StubEntityDto
{
	RecursiveCollection1 = x.RecursiveCollection1.SelectMany(y => y.RecursiveCollection1).SelectMany(y => y.RecursiveCollection1).Select(y => new StubEntityDto()).ToList()
};

We first extract RecursiveCollection1 and then go deeper and extract another RecursiveCollection1, so we should have two includes: "RecursiveCollection1" and "RecursiveCollection1.RecursiveCollection1". However, let’s take this mapping:

Mapping = x => new StubEntityDto
{
	RecursiveCollection1 = x.RecursiveCollection1.OrderBy(y => y.RecursiveCollection1).SelectMany(y => y.RecursiveCollection1).Select(y => new StubEntityDto()).ToList()
};

It looks like we are doing almost the same: we first extract RecursiveCollection1 and then we try to extract it again. However, this time instead of doing SelectMany we call OrderBy, so we should have exactly one include: "RecursiveCollection1".

The problem here is: SelectMany and OrderBy looks very similar. They both take collection as an argument and single lambda. However, the former performs projection whereas the latter does not. But we can’t simply tell which function does what basing only on their signature. And this is only the simplest case, imagine now, how you analyze aggregates, zips, concats and all other stuff. Of course we can try to detect functions performing projections and handle them differently. Can we?

Mapping = x => new StubEntityDto
{
	RecursiveCollection1 = x.RecursiveCollection1.MyCustomMethod(y => y.RecursiveCollection1).SelectMany(y => y.RecursiveCollection1).Select(y => new StubEntityDto()).ToList()
};

Can you tell me know what does MyCustomMethod do? Depending on its behavior we should get different mapping, however, we can’t tell the difference basing only on lambda mapping, because lambda does not include implementation of MyCustomMethod, lambda merely calls it.

Let’s try different approach.

Calling code

Idea is simple: we can compile the lambda and simply call it on fake entity acting as a proxy. This looks easy but requires ability to create proxy in runtime basing on the type. In theory, this should be possible because EF does almost the same so our entity should be prepared for that. However, let’s consider this mapping:

Mapping = x => new StubEntityDto
{
	IntegerList = new []{ x.RecursiveEntity.RecursiveEntity.IntegerValue }.ToList()
};

This mapping doesn’t need any includes, however, simply running it on a fake entity is not trivial. Please note that we are extracting RecursiveEntity.RecursiveEntity.IntegerValue which means that first RecursiveEntity must not be null, second RecursiveEntity must not be null as well. But RecursiveEntity is a field, not a virtual property, so we cannot simply intercept call to getter and return custom proxy, we need to inject it during construction.

In theory we are able to set fields to some nonnull, nonempty values during construction using fake objects. But what if some field is a reference to sealed type? Creating such a graph of objects might be very cumbersome, even using some nice tricks. But there is another, even more challenging case:

Mapping = x => new StubEntityDto
{
	RecursiveCollection1 = DateTime.Now.Day == 7
		? x.RecursiveCollection1.SelectMany(y => y.RecursiveCollection1).SelectMany(y => y.RecursiveCollection1).Select(y => new StubEntityDto()).ToList()
		: x.RecursiveCollection1.OrderBy(y => y.RecursiveCollection1).SelectMany(y => y.RecursiveCollection1).Select(y => new StubEntityDto()).ToList()
};

We perform different operations depending on the date. If we try to generate includes by examining lambda, this is not so difficult: we simply analyze both branches of condition. However, if we try to execute the lambda, we need to be able to traverse the code using both paths. In this case this is doable (we can generate two lambdas and analyze them one by one), but consider this:

Mapping = x => new StubEntityDto
{
	RecursiveCollection1 = x.RecursiveCollection1.MyCustomMethod(y => y.RecursiveCollection1).SelectMany(y => y.RecursiveCollection1).Select(y => new StubEntityDto()).ToList()
};

IEnumerable< StubEntity> MyCustomMethod(this IEnumerable< StubEntity> entities, Func< StubEntity, ICollection< StubEntity>> lambda)
{
	return DateTime.Now.Day == 7 ? entities.Select(lambda) : entities.OrderBy(lambda);
}

Our lambda contains only call to MyCustomMethod which logic depends on a date. We can’t analyze this function using lambda expressions, in theory we can decompile it and find all branches but do we really need this?

Back to the basics

So we have the following approaches:
Analyzing lambda

  • We can’t tell the difference between ordering and projection (well, can’t do this in easy way)
  • We can’t analyze external functions
  • We can track branching instructions
  • We don’t need to create dynamic types, proxies, and all other black-magic stuff

Executing lambda

  • We need to create dynamic types
  • We need to be able to create instances with nonempty values (for sealed types)
  • We can differentiate between mapping and ordering
  • But we cannot easily analyze branching instructions
  • And there is a huge risk of null reference somewhere in the middle

So it looks like we are doomed in general case. But, hey, this is just a simple mapping from entity to DTO, maybe we don’t need to consider all these fancy stuff because we are not going to do that in the first place? If we constrain ourselves only to using Select, SelectMany, and simple branching, we can do the following:

class IncludeVisitor
{
	private readonly IDictionary< Expression, string> _mappings;
	private string _currentMapping;
	public IEnumerable< string> Includes { get; private set; }

	public IncludeVisitor()
	{
		_mappings = new Dictionary< Expression, string>();
		_currentMapping = "";
		Includes = new List< string>();
	}
	
	protected override Expression Visit(Expression e){
		// Some magic to traverse all parts of an expression
	}

	protected override Expression VisitLambda(LambdaExpression lambda)
	{
		var newMappings = lambda.Parameters.ToDictionary(p => p, p => _currentMapping);

		foreach (var mapping in newMappings)
		{
			_mappings[mapping.Key] = mapping.Value;
		}

		return base.VisitLambda(lambda);
	}

	protected override Expression VisitMemberAccess(MemberExpression m)
	{
		var baseResult = base.VisitMemberAccess(m);

		string prefixMapping;

		if (!_mappings.TryGetValue(m.Expression, out prefixMapping))
		{
			prefixMapping = "";
		}

		_currentMapping = prefixMapping + $".{m.Member.Name}";
		_mappings[m] = _currentMapping;
		GenerateIncludeIfNeeded(m);

		return baseResult;
	}

	private void GenerateIncludeIfNeeded(MemberExpression memberExpression)
	{
		var type = memberExpression.Type;

		if (type.IsGenericType && type.GetGenericTypeDefinition() == typeof(ICollection< >))
		{
			Includes = Includes.Union(new[] { _currentMapping.Trim('.') });
		}
	}

	public void VisitExpression(Expression expressionToVisit)
	{
		Visit(expressionToVisit);
	}
}

Idea is: we have a dictionary mapping expressions to its prefixes. When we spot a lambda, we capture all existing prefixes and assign them to parameters. When we spot member access, we catenate prefixes and check whether we are accessing property. Assuming that lambda is not complex and we are analyzing it from left to right, this should work in most cases. For all the other situations we simply should write includes down manually.

Summary

Idea was very simple but actual implementation is somewhat difficult. Of course, it all depends on your requirements, sometimes it is better to cover 80% of cases and don’t bother with other ones, for which even our most complicated implementation will not work anyway.