LINQ Query Syntax and Type Argument Inference

In LINQ, you can query in-memory data collections using Method Syntax (the IEnumerable extension methods) or Query Syntax (the SQL-like, declarative syntactic sugar on top of the underlying method calls).  I was using the query syntax to join a few in-memory collections and then project (like a SQL SELECT) the denormalized data into a new object that only contained a few pieces of information from each of the separate data collections.  The code could look similar to this (we’ll go with a shopping example in honor of the upcoming holiday season):

var shoppingLists =
    from person in people
    join wishlist in wishlists on person.Id equals wishlist.personId
    join store in stores on store.Id equals wishlist.storeId
    select new
    {
        PersonName = person.Name,
        StoreName = store.Name,
        NumItems = wishlist.Items.Count
    };

However, I ran into the following intellisense red squiggly error in the Visual Studio C# code editor on the second join statement:

The type arguments for method ‘IEnumerable<TResult> System.Linq.Enumerable.Join<TOuter,TInner,TKey,TResult>(this IEnumerable<TOuter>, IEnumerable<TInner>, Func<TOuter,TKey>, Func<TInner,TKey>, Func<TOuter,TInner,TResult>)’ cannot be inferred from the query.

The error might also end with a statement that says "Try specifiying the type arguments explicitly."

Let’s Take a Closer Look…

The underlying problem is best understood by examining the method signature of the LINQ Join extension method and how the LINQ query syntax for the keyword join maps to that method call in the compiler.  The type signature for the extension method is as follows:

public static IEnumerable<TResult> Join<TOuter,TInner,TKey,TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter,TKey> outerKeySelector,
    Func<TInner,TKey> innerKeySelector,
    Func<TOuter,TInner,TResult> resultSelector)

Consistent with all extension methods, it is public static and has a this keyword in front of the first method parameter, allowing you to call the Join method as if it had been declared in the original definition of the IEnumerable<> class or one of its inheriting children classes.

The first two parameters are the two data collections being joined, and in the case of my LINQ query above, they would be the joined collection of people & wishlists and then the stores collection, respectively.

The next two method parameters take delegates or lambda expressions that select the key value from each collection participating in the join, outer collection key selector first and then the inner key selector second.  The key values of each item in each collection will compared to each other for equality.  It will use the normal definition of the Equals method for your object type (or if you specified a custom one in your class definition), or there is another Join extension method signature that takes in another method parameter that an object that conforms to the IEqualityComparer<TKey> interface.  The key selectors from our LINQ query sample would be the two member access expressions in the "on store.Id equals wishlist.storeId" statement.

The last parameter is a delegate or lambda expression that takes in a item from each participating data collection and projects the items into a new form (like the Select LINQ query method and its SQL equivalent).  The two items being passed into the Func have passed the equality comparison of each item’s key value, exactly the same as how rows from two tables match up in an inner join statement in SQL.

As for the generic type parameters in this extension method definition, TOuter and TInner are the types from the two collections participating in the join.  TKey is the type returned by the key selectors; notice that in order to compare the equality of each key selected, they need to be of the same type.  In our example, TKey is likely int or long (or maybe Guid) since we’re dealing with Id properties of all of our objects.  TResult is the type returned by the projection Func<TOuter,TInner,TResult> resultSelector; in our case, we returned an anonymous type consisting of three properties (PersonName, StoreName, and NumItems).

Ok, So Where Did I Screw Up Then?

To answer this, let’s take a look one more time at my LINQ query syntax, how it would map to LINQ method syntax, and the Join method signature again all side-by-side:

var shoppingLists =
    from person in people
    join wishlist in wishlists on person.Id equals wishlist.personId
    join store in stores on store.Id equals wishlist.storeId
    select new
    {
        PersonName = person.Name,
        StoreName = store.Name,
        NumItems = wishlist.Items.Count
    };

// Equivalent LINQ method syntax
var shoppingLists2 = people
    .Join(wishlists,
        person => person.Id,
        wishlist => wishlist.personId,
        (person, wishlist) => new { person, wishlist })
    .Join(stores,
        store => store.Id,
        something => something.wishlist.storeId,
        (obj, store) => new
        {
            PersonName = obj.person.Name,
            StoreName = store.Name,
            NumItems = obj.wishlist.Items.Count
        });

public static IEnumerable<TResult> Join<TOuter,TInner,TKey,TResult>(
    this IEnumerable<TOuter> outer,
    IEnumerable<TInner> inner,
    Func<TOuter,TKey> outerKeySelector,
    Func<TInner,TKey> innerKeySelector,
    Func<TOuter,TInner,TResult> resultSelector)

The issue is lines 19 and 20 above; they need to be switched.  The compiler can’t properly infer the generic type parameters because we have our key selectors out of order in the argument list when invoking the Join method.  If you wanted to be explicit about your generic type parameters assignments, then Join<TOuter, TInner, TKey, TResult> would be Join<Temp1, Store, int, Temp2>, where Temp1 and Temp2 represent the magical type definitions that the compiler creates underneath the covers when you use anonymous types in your C# code.  Notice the outerKeySelector needs to come before the innerKeySelector.  It appears the "on store.Id equals wishlist.storeId" syntax is not a commutative operation.

One last thing, I just realized that the error(s) given in Visual Studio when you actually compile the code are different than the intellisense red squiggly error message.  It appears an actual build of the code is able to be a little more omniscient than the quick compile that intellisense does as you type.  The error (and intelligent suggestion) the compiler offers says something like this:

The name ‘store’ is not in scope on the left side of ‘equals’.  Consider swapping the expressions on either side of ‘equals’.

Additional Info

While reading the MSDN article on the LINQ Join method, I thought the "Remarks" section has some really great stuff about deferred execution, default equality comparison, differences from SelectMany, sort order perseverance, and so on:

This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.

The default equality comparer, Default, is used to hash and compare keys.

A join refers to the operation of correlating the elements of two sources of information based on a common key. Join brings the two information sources and the keys by which they are matched together in one method call. This differs from the use of SelectMany, which requires more than one method call to perform the same operation.

Join preserves the order of the elements of outer, and for each of these elements, the order of the matching elements of inner.

In query expression syntax, a join (Visual C#) or Join (Visual Basic) clause translates to an invocation of Join.

In relational database terms, the Join method implements an inner equijoin. ‘Inner’ means that only elements that have a match in the other sequence are included in the results. An ‘equijoin’ is a join in which the keys are compared for equality. A left outer join operation has no dedicated standard query operator, but can be performed by using the GroupJoin method. See Join Operations.

Thanks, I hope this proves helpful to someone, as it was certainly eye-opening to me.