Thursday, January 31, 2013

IEnumerable Still Misunderstood

With .NET 4.5 released, adopted, and leaking Task all over the code, it shocked me enough to write about a recent set of interviews in which I found that IEnumerable, the basis for .NET 3.5's biggest feature LINQ, is still misunderstood.  Three out of three recent contract candidates equate IEnumerable with List and when pressed fail over to "Well, then Collection."  If you fail to understand what is wrong with this belief, read on.

The interview question that consistently is bringing out the wrong belief that IEnumerable is a data structure follows:
We are building a distributed counter system and to reduce client messages the system receives messages such as "INC 10 /exceptions/nullReferenceExceptions" which will INCrement by 10 the counters for "exceptions" and "nullReferenceExceptions".  Don't worry about the overall implementation, we just need the bit where the counter "path" is multiplexed (if needed: meaning that a single message turns into multiple messages).  How would you code the method to extract the parts of the path?  Let me start by providing the method signature:
string[] ExtractPathParts(string path);

The answers besides highlighting whether the candidate prefers for or foreach, the answer also highlights whether the candidate understands IEnumerable and is willing to argue to change the method signature.  Unfortunately, even with prompting, the candidates are failing to understand (yes, this could be due to selection bias :/ ).

So what is IEnumerable?  IEnumerable is far from a data structure; it is an adapter for data structures or code to appear like a data structure.  The latter was more the reason for the development of LINQ and its core bit, IEnumerable.  This was the most significant early push in .NET Framework development, bringing functional programming more to the fore.
The following is an example where code, not an existing data structure, uses IEnumerable to act like a data structure:

public IEnumerable ExtractPathParts(string path)
{
  if (!string.IsNullOrEmpty(path))
  {
    const char slash = '/';
    const char whack = '\\';

    var startIndex = 0;
    var pathLength = path.Length;
    var ch = default(char);
    for(var i = 0; i < pathLength; ++i)
    {
      ch = path[i];
            if (ch == slash || ch == whack)
      {
        var subLength = i - startIndex - 1; //<< -1 to omit the slash
        if (subLength > 0) //<< this also omits the leading slash
        {
          if (startIndex > 0)
          {
            ++startIndex;
          }
          else
          {
            ++subLength;
          }
          ch = path[startIndex];
          if (ch == slash || ch == whack)
          {
            ++startIndex;
            --subLength;
          }
          yield return path.Substring(startIndex, subLength);
        }
        startIndex = i;
      }
    }

    ch = path[startIndex];
    if (ch != slash && ch != whack)
    {
      yield return path.Substring(startIndex);
    }
  }
}

The above while not being DRY, it is focused on yield return and this being the important bit in understanding IEnumerable and its importance in LINQ.  An alternative implementation follows:

public string[] ExtractPathParts(string path)
{
  if (string.IsNullOrEmpty(path)) return new string[] {};
  return path.Split('/', '\\').Where(it => !string.IsNullOrEmpty(it)).ToArray();
}

The above uses LINQ, but is missing the point.  LINQ was developed to allow developers to reduce memory allocations, trading CPU which is overly available with the increase in processors per commodity server and per consumer desktop machine.  Memory costs physically are not bounding consumers or systems engineers.  64-bit processors are making memory more addressable, so more available.  These are not the reason why memory allocations should be avoided where possible.  The reason to avoid memory allocations is that .NET, like other garbage collection based memory management models, suffer a blocking point when garbage collection occurs.

I hope that this article helps to make .NET Framework 3.5 understood more broadly.  If not, please pose questions.