Parallel.For Loops in .NET 4

I was having fun working on the Toughest Developer Puzzle Ever 2 when I came across a problem that asked for the coordinates into a grid of numbers of the upper left corner for a 4x4 sub-grid with a magic sum of 34, meaning all rows, columns, and the 2 diagonals all sum up to 34.  Instead of trying to figure it out manually, I probably did what many programmers would do and wrote a quick and dirty program to accomplish the task for me.

I didn’t try to use any optimized search algorithm or anything, as I knew that the problem space was very small and specific and that computers are really fast.  So I figured I would just write a brute-force iterate through every row and column type of an algorithm (in Big O notation being O(mn) or just O(n2) if the original grid is square), knowing it would return in milliseconds.

I quickly realized that each iteration was not dependent any previous iteration for its calculations, so I decided it might be a good time to try out the new Parallel.For loops in the .NET 4 Framework that would multi-thread the iterations of the loops.  Here’s the implementation I threw together:

using System;
using System.Linq;
using System.Threading.Tasks;

namespace Find34Grid
{
    public class Program
    {
        private static readonly int[][] Grid = new []
        {
            new [] {16,3,2,13,15,10,3,6,41,15,14,4,12,8,7,1,12},
            new [] {5,10,22,8,4,5,16,7,9,7,6,12,5,11,10,8,5},
            new [] {9,6,7,12,14,11,2,13,16,3,2,13,15,10,3,6,5},
            new [] {41,15,14,4,12,8,7,2,5,10,11,8,4,5,16,9,15},
            new [] {16,3,2,13,15,10,3,6,15,10,16,2,3,13,16,2,3},
            new [] {5,10,11,8,4,5,16,9,4,5,5,11,10,8,5,11,10},
            new [] {9,6,7,12,14,11,2,13,14,11,9,7,6,12,9,7,6},
            new [] {41,15,14,4,12,8,7,1,12,8,4,14,15,1,14,15,1},
            new [] {9,7,6,12,5,11,10,8,5,11,10,3,6,41,15,14,4},
            new [] {4,14,15,1,9,7,6,12,9,7,5,16,9,9,7,6,12},
            new [] {12,8,13,13,4,14,15,1,4,14,11,2,13,16,3,2,13}
        };

        private const int MagicSum = 34;

        public static void Main(string[] args)
        {
            Parallel.For(0, Grid.Length - 3, i =>
            {
                Parallel.For(0, Grid[i].Length - 3, j =>
                {
                    int[] sums = new []
                    {
                        Grid[i].Skip(j).Take(4).Sum(), // Row 1
                        Grid[i+1].Skip(j).Take(4).Sum(), // Row 2
                        Grid[i+2].Skip(j).Take(4).Sum(), // Row 3
                        Grid[i+3].Skip(j).Take(4).Sum(), // Row 4
                        new [] { Grid[i][j], Grid[i+1][j], Grid[i+2][j], Grid[i+3][j]}.Sum(), // Column 1
                        new [] { Grid[i][j+1], Grid[i+1][j+1], Grid[i+2][j+1], Grid[i+3][j+1]}.Sum(), // Column 2
                        new [] { Grid[i][j+2], Grid[i+1][j+2], Grid[i+2][j+2], Grid[i+3][j+2]}.Sum(), // Column 3
                        new [] { Grid[i][j+3], Grid[i+1][j+3], Grid[i+2][j+3], Grid[i+3][j+3]}.Sum(), // Column 4
                        new [] { Grid[i][j], Grid[i+1][j+1], Grid[i+2][j+2], Grid[i+3][j+3]}.Sum(), // Diagonal 1
                        new [] { Grid[i][j+3], Grid[i+1][j+2], Grid[i+2][j+1], Grid[i+3][j]}.Sum() // Diagonal 2
                    };

                    if (sums.All(x => x == MagicSum))
                    {
                        Console.WriteLine("y (row): {0}\nx (column): {1}", i, j);
                    }
                });
            });

            Console.ReadLine();
        }
    }
}

Nothing special and certainly did the job.  I didn’t really notice a performance improvement in anyway, but that was expected for such a small problem.  I don’t really expect a whole lot of feedback on such straightforward code, but if there is a cleaner or more efficient way to utilize Parallel.For or Parallel.ForEach, please don’t hesitate to let me know via commenting to this post.  Thanks!

Anagram Code Kata Part 5 – Domain Objects Over Language Primitives

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we will discuss some reasons why you might want to avoid using language primitives directly in place of domain objects.  Specifically, I have been using String variables and objects to represent words up to this point.  Esteban suggested I create a Word class for a few good reasons which I’ll layout for you.  Out of sheer luck and good timing, Dru Sellers of CodeBetter.com also wrote on this subject shortly afterward and confirmed Esteban’s reasoning.

Use String Class or Create Word Domain Object?

I mentioned in the previous post that Esteban had suggested creating a Word class instead of just passing around strings everywhere through my application.  One good reason is that we don’t own the String class and so further modification to our string-based implementation is difficult because the logic is scattered throughout the application, instead of centralized under the responsibility and definition of one domain class.  It is much more difficult to refactor the word-specific logic with its behaviors and attributes spread throughout the code.  Even if your Word class doesn’t grow any more beyond a seemingly unnecessary wrapper of the String class, the code is more cohesive and ready for change should the need arise.

But more importantly, you designed the code thinking in a true object-oriented mindset.  You have to keep in mind that the String class is someone else’s implementation, and is hardly ever sufficient in and of itself as a domain object within your solution.  Think about it, the behaviors that a string object performs are so generic and multipurpose that you likely don’t need two-thirds of the class as defined (and there’s likely a few behaviors you really need that aren’t there).  Of course nearly every application on earth makes use of the String class; but because of this fact, it has no meaning in and of itself within any given application.  You would have to look at how the stings are actually used within the code in order to understand its unique application within the app’s context.  Of course that task of research is much easier for everyone (including the original author of the code) if it’s all encapsulated within a dedicated domain object class.  True object-orientation means describing in code form the properties, behaviors, and interactions/relationships of real world objects within your problem domain.

Esteban gave me a great example to illustrate these points.  He said that you can always represent money as a decimal (and even when you use a domain object, it’s got to have a decimal language primitive underneath the covers).  However, what happens when you need to attach metadata to the amount (like currency denomination), or if you need to change decimal precision?  You would have to go through all of the code and make sure your use of decimal language primitives is modified uniformly in order to retain consistency.  Also, mathematic operations involving money are hardly ever the same as their counterparts involving standard decimals, because currency deals with discrete values to a certain decimal precision.  Typically when the behaviors and properties within our system begin to get complex, we are cognizant enough to create domain objects in order to bring it all under one class.  We definitely don’t want to over-architect features and interactions before we need them, but I think there is power in this principle of abstracting away language primitives and instead encapsulating their use within domain objects located in just one place in your codebase.  I believe it is one thing we can keep in mind to help guide us to better object-oriented thinking, and avoid language-oriented coding.

As mentioned above, Esteban’s thoughts were confirmed nearly verbatim by Dru Sellers of CodeBetter.com in his blog post that he wrote just a few days after I had the conversation with Esteban.  A great coincidence no doubt, and worth a read; here’s the link:

Business Primitives (1/2) 

Word Class Implementation

So basically I created the following class implementation and then replaced string with a reference to this new class, Word:

public class Word
{
    private string wordStr = string.Empty;

    public Word(string wordStr)
    {
        this.wordStr = wordStr;
    }

    public override string ToString()
    {
        return wordStr;
    }

    public override bool Equals(object obj)
    {
        return wordStr == ((Word)obj).ToString();
    }

    public override int GetHashCode()
    {
        return base.GetHashCode();
    }

    public int GetCanonicalHashCode()
    {
        char[] letters = wordStr.ToCharArray();
        Array.Sort<char>(letters);
        return new string(letters).GetHashCode();
    }
}

I have defined an overridden implementation for Equals(object) (so that the test assertions and other IEnumerable.Contains() queries work) and GetHashCode() (solely to satisfy a compiler warning).  I also moved the GetCanoncialHashCode() method from AnagramGrouper, in order to better encapsulate it as a behavior a Word knows how to do innately.

One other change I had to make was to convert strings into Word objects in our NewlineFileParser, which I accomplished by using an IEnumberable.Select() call as shown below:

return File.ReadAllLines(filePath).Select<string, Word>(x => new Word(x));

Let’s Revisit AssertWasCalled One More Time

Trust me, I am groaning with you, even as I wrote that heading text.  The last thing we need is for me to rehash the topic again and flip flop my stance yet another time.  Yes, that’s right I’ve changed my mind again.  First I couldn’t understand what utility asserting methods were called would have under normal test scenarios.  Then I changed my mind that perhaps using it would help my test specifications have clearer intent of what I am asserting.  After an email conversation with David Tchepak, I think I’m now back to my original stance.  Here is what Dave said that had me reconsidering, and I think it’s pretty sound reasoning (emphasis added by me):

“But seeing as you asked for it, here goes. :)

“I try and avoid AssertWasCalled like the plague. Generally I don’t care that some method was called, I care that my class does the work it needs to. If that involves calling a dependency then great, but that is not my class’ reason for existence.

“I prefer your original approach of stubbing out everything in the setup and having that tested indirectly. One reason I prefer this is I find it makes it easier to refactor: the assertions in my test don’t change, the class still does the same thing. However I can add or change dependencies by changing some wiring in the setup, and then make sure I haven’t stuffed anything up as my assertions still pass. I prefer that my tests specify what I want, not how it does it. To me, AssertWasCalled reeks of over-specification. The one exception is where I hit the external boundaries of my code, so where I want to send an email or something without side effects that I can test. Then the core of the behaviour is the call itself, so I’m happy to assert on that then.”

And with that I’ll promise to never bring this up again…unless of course I get swayed by someone else. :)  In all seriousness, I think this is an interesting discussion and so if you have insight, please share via comment below.

Summary

Please leave your thoughts in the comments below in regard to creating domain objects over using language primitives (or heaven forbid, the AssertWasCalled debate).  As far as this coding kata exercise, I need to take a high-level look at where this should go next.  Perhaps the next post will tie up loose ends and see how our code performs on large text files as input.  It may be that we will need to refactor our architecture to achieve better runtimes.  If not, maybe we can still discuss where would could have headed if it had been necessary.

Anagram Code Kata Part 4 – Will It Write My Code for Me?

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we work on implementing the other main dependency, IAnagramsFinder.  In doing so, you will discover that I have been a little naïve with this BDD methodology. Apparently I expected a different experience than what I ran into doing the exercise.  In gathering feedback, I had a very interesting discussion about Object-Oriented Programming with Esteban that I will summarize and ask for further discussion from all of you.  I’ll also show you one way to debug your code when running the MSpec tests via the console test runner.

New Thoughts on Mocks vs. Stubs

During Part 2 of this series, Tobias left a comment about using mocks and AssertWasCalled, rather than stubs and defining fake responses.  I didn’t understand how it would help, as it appeared they would accomplish roughly the same thing.  With stubs and fake responses, if you don’t get back the return you are looking for, then you can infer that the dependencies weren’t called since you had previously hard-coded their return values when called.  With mocks and asserting methods were called, you directly assert that the manager class did in fact call it’s dependencies.

When I looked more closely into using the AssertWasCalled method in Rhino.Mocks, I noticed that my strategy of using stubs and faked return values wasn’t very explicit in signifying that the ultimate responsibility of the manager class was to delegate to its dependencies.  I mean the test context does rigorously stub out the dependencies, but the actual assertions weren’t clear what we were testing.  Therefore, I decided to pop in a pair of AssertWasCalled statements (as found below), leaving the rest of the test the same as it was back in Part 2.

It should_result_in_list_of_anagram_sets_and_both_dependencies_should_have_been_called = () =>
{
    fileParser.AssertWasCalled(x => x.ExtractWordListFromFile(filePath));
    anagramGrouper.AssertWasCalled(x => x.FindAnagramSets(wordListFromFile));

    result.ShouldEqual(expected);
};

However, when I thought to simplify the test code by removing most of stubbing setup, it appeared I had to leave most of the stub logic in tact in order to test if anagramGrouper.FindAnagramSets() was called with parameter wordListFromFile, which was one of the stubbed fake responses.  The one dependency (the file parser) needed to pass it’s return value to the next dependency (the anagrams finder).  I could not think of a way to accomplish testing that the manager class facilitated that without the stubbing logic (fake responses) that I created.

It would be great if someone could enlighten me since I’m such a newbie to mocking and stubbing.  Otherwise, I feel the intent of the test is now more clear with the AssertWasCalled checks that I added, even if that I means I’m needlessly mixing stubbing and mocking.  But one last question:  should I try to stick more closely to the rule of thumb guiding me to stick to one all-encompassing assertion per test if possible?  Let me know your thoughts.

Implementing the AnagramGrouper Dependency

So let’s continue onward with some more actual design and code.  We are tackling the AnagramGrouper dependency, whose sole responsibility is to take in the parsed collection of words and find anagram sets.  Here is the test spec (one thing to note is that in some of my earlier tests for the AnagramsFinder manager class and IFileParser, I had used string arrays liberally; I decided to go with IList<string> collections instead):

public class AnagramGrouperSpecs
{
    [Subject(typeof(AnagramGrouperSpecs))]
    public class when_given_word_list_collection
    {
        static AnagramGrouper sut;
        static IEnumerable<IList<string>> result;
        static IEnumerable<string> wordList = new[] { "wordA", "wordB", "Aword", "wordC" };

        Establish context = () =>
        {
            sut = new AnagramGrouper();
        };

        Because of = () =>
        {
            result = sut.FindAnagramSets(wordList);
        };

        It should_result_in_anagram_sets_collection_of_length_1 = () =>
        {
            result.Count().ShouldEqual(1);
        };

        It should_contain_2_specific_words_in_the_anagram_set = () =>
        {
            result.First().ShouldContain("wordA", "Aword");
        };
    }
}

Here too I have basically two assertions (but split it up differently than last time); should I also try harder to avoid this as well?

When contemplating how to further breakdown this responsibility of finding anagram sets into subtasks, I figure we would need one piece of code that would compare two words and answer whether they are anagrams of each other, and another piece of code that would keep track of the various anagram sets (perhaps a collection or dictionary of string arrays).  With what BDD has revealed to me thus far, I figure this means we are going to see two more dependencies (and interfaces, and spec tests) added to the codebase.

So I start with the word comparison task, thinking I could perhaps leverage the String.GetHashCode() method for assigning meaningful value to the list of characters in the string.  However, string hash codes give importance to letter casing and order of characters, so a modified strategy would have to be utilized, even though GetHashCode() seems quite close to filling our need.

Now I probably should have come up with this ingenious algorithm all by myself, but…it is what it is and this is the best, most concise solution that I can put together to solve the problem.  With that said, I ended up doing some internet searching in order to proof-of-concept my GetHashCode() idea.  I came across pure genius in the form of Brian Mullen’s blog post entitled LINQ group by and GroupBy, specifically the second-to-last code snippet.  Just String.ToCharArray() the word and then Array.Sort() the resultant array; now turn it back into a string and do GetHashCode() now.  (As a side note, we can worry about case-insensitivity later if desired by simply doing a String.ToLower() on the word before getting the hash code.)  Now anagrams will have the same hash code (Brian uses the term “canonical”, which I think is very fitting); it’s literally perfect in every way!

So now for the other dependency, I figure we could use a Dictionary<int, IList<string>>, where the key is the canonical hash code and the value is a string list collection representing an anagram set of words.  These two dependency implementations are so simple and straight forward that I have a hard time seeing a reason to continue breaking out full-fledged object dependencies with interfaces and Dependency Injection.  However, I will make a case for moving the canonical hash code logic outside this AnagramsGrouper class, but that will come later.

So let’s write some code to pass our test/spec:

public class AnagramGrouper : IAnagramGrouper
{
    public IEnumerable<IList<string>> FindAnagramSets(IEnumerable<string> wordList)
    {
        Dictionary<int, IList<string>> results = new Dictionary<int, IList<string>>();

        foreach (string word in wordList)
        {
            int canonicalHashCode = GetCanonicalHashCode(word);

            if (results.ContainsKey(canonicalHashCode))
            {
                results[canonicalHashCode].Add(word);
            }
            else
            {
                results.Add(canonicalHashCode, new List<string>(new[] { word }));
            }
        }

        return results.Values;
    }

    public static int GetCanonicalHashCode(string word)
    {
        char[] letters = word.ToCharArray();
        Array.Sort<char>(letters);

        return new string(letters).GetHashCode();
    }
}

Maybe you noticed already, but there is a bug in my logic.  At that time, I didn’t see it and needed to debug the code while running.  Setting a debug break point in your code doesn’t work like you’re probably used to, and this is because we’re running the MSpec tests via its console test runner.  I had to do some searching to figure this one out, and that’s what we’ll cover next.

How to Debug MSpec Specifications and the Code They Test

After a short ride on the Google super highway, I found a blog post containing a November 2009 comment from Aaron Jensen, author of MSpec, that gave a few possible solutions.  I opted to go with the Debugger.Break() solution, which I put at the very top of my FindAnagramSets method.  It is a tad awkward because it will show a Windows error dialog box that says the program encountered a problem and Windows is searching the web for a solution.  If you wait through that, it will give you an option to debug the program via a Visual Studio dialog where you select the currently running instance of Visual Studio as your debugger instance.  You just remove the statement when you’re done (and you’ll really want to remember to remove it from your production code).  Does the trick, though it is a bit obtrusive to your code base.

If you haven’t already figured it out, the bug is that my FindAnagramSets method is returning all anagram sets found in the dictionary.  An anagram set of just one word is not really an anagram set, which is why our test specification which is based on that assumption failed.  The fix is to filter out anagram word sets with less than two words before returning, like this:

// Return anagram sets with more than one word in it
return results.Values.Where(x => x.Count > 1);

That will make the code pass the tests now.  Remember to remove the debugger statement we added.

BDD, OOP…WTH?!

Just kidding.  But I do want to discuss a reaction I had to BDD after implementing this anagram finding solution.  After I completed the code and finished the tests, I was feeling pretty great.  But then I started thinking about how I didn’t really go down the path that I felt BDD was originally trying to lead me down with AnagramGrouper, which was to create two more dependencies, just as I had done for the first managing class AnagramsFinder.  I mean I felt pretty frustrated because I was digging BDD to this point.  I’m not quite sure what my original expectations were, but I might have thought BDD was magically going to right my algorithms for me as well.

I decided to discuss the whole thing with Esteban to get some perspective on the issue.  He told me he didn’t know a thing about BDD and how it’s supposed to work, but he figured BDD would lead me to decoupled, cohesive code design, but he couldn’t see how it would aide with algorithmic solutions to the problems at hand.  I’m not sure why I was so naïve, but it makes complete sense.  I mean these methodologies are designed to keep you disciplined to some proven, guiding principles, but you still need to use your brain and creativity to actually solve the problem.  I guess it would help to get some feedback and suggestions from you all in regard to knowing when it is time to switch from BDD architecture design mode to algorithm design mode within the broken down dependencies that handle the very specific sub-responsibilities.  Please let me know if you have found some guiding concept or question you’ve come up with that you put to the test in order to lead you into the right coding mindset for that particular context (design or algorithm; or do I even have the right categories?).

Perhaps it’s just that this design process feels so new to me.  Maybe I have solely been in the coding, algorithm mindset this whole time throughout my career.  Perhaps it will become more natural with practice and experience.  I guess it just feels like I’m learning to drive a manual, stick-shift car or something, and I am a little rough with the clutch when transitioning between gears.

To take this discussion further, Esteban made a comment in our conversation about his personal thoughts on these methodologies (like BDD) and Object-Oriented Programming.  He has come to believe that we invent these methodologies like flavors of the week “because as programmers we don’t really get OOP.”  Perhaps OOP just comes more naturally to his thought process that most people (or he admits it’s possible he could just be completely ignorant), but he feels that there’s really just a few simple rules and guidelines to keep in mind at all times in order to keep your objects behaving like real life objects and not acting like data constructs.  I encouraged him to give his theory some additional thought and make it a little more formal by writing it down.  He did this and the following is a link to the resultant blog post:

3 Simple Rules to Good Object Oriented Code

Please read through that post and comment on it.  His idea needs to be criticized and/or validated.  He admits he may have no clue what he’s talking about, but I said that’s why it needs to be put to the test by posting it in the open.  We both think it’s an interesting theory, but we’re unsure if it even holds any water.  Perhaps Esteban’s theory is too simplistic to be practical for many programmers.  Provide feedback and lots of it.

Summary

So I think I need to wrap this post up about now.  I think I have raised enough discussion points to write a book, but please do take the time to comment on what I’m doing wrong, what I’m doing right, or how to make the BDD mentality more flowing and natural.  I promised to talk about having a Word class as opposed to using primitive strings everywhere, and that will definitely be in the next post.  Also, I think our AnagramGrouper class may need some more testing and flushing out of how it should handle spaces, punctuation, casing, and so on.

Thanks for sticking with me thus far.

Anagram Code Kata Part 3 – Use Common Sense

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we will discuss how far to take the idea of creating mockable contracts via interfaces for testing purposes and to keep responsibilities modular.  As I solicited feedback from commenters, I was getting hung up on applying this concept to every object and dependency.  We’ll talk about how I’m now thinking that overly strict adherence to the practice can drive you nuts and decrease productivity.  It is meant to help guide your design process along rather than hinder it.  We’ll finish this post off with one of our two dependencies implemented in an extremely simple manner.

IFileParser Dependency

When we left off last time, we had defined two separate responsibilities that AnagramsFinder was going to push down into its dependencies.  As I started to write a specification test for the newly created NewlineFileParser, I realized it would be difficult to stub based on providing a file path to a text file as the only parameter.  I began having thoughts that maybe I could break up another pair of dependencies for file parsing, namely obtaining a file stream (or file contents string) from a path and then parsing the file contents into a list of multiple lines.  But then again, my first of the two dependencies is still dependent on the physical file system and I cannot get away from this unless I somehow mock up an in-memory file system.  This began to feel silly to me as I was hoping I would just use the built-in .NET APIs for file systems and string parsing.

After emailing Dave and Esteban asking for guidance, the revelation I received via these two fine gentlemen is that file parsing is not an essential purpose of this application nor exercise (at least not yet; we won’t be focusing on performance unless it appears to be grossly deficient as we go along).  What we could do is write integration tests instead, where we have a set of simple test files sitting in a folder that we run our NewlineFileParser against to ensure it working as we expect.  But I don’t think I’m even going to do this at this point.  I really do want to get to the core of the application.  Plus, I feel certain that I can trust the implementation of File.ReadAllLines() from the .NET Framework.

Unless it needs to be changed further down the road, the following code implementation will likely be sufficient for the remainder of this coding exercise:

public class NewlineFileParser : IFileParser
{
    public IEnumerable<string> ExtractWordListFromFile(string filePath)
    {
        return File.ReadAllLines(filePath);
    }
}

I feel great about this because I haven’t gone insane trying to mock and test everything in sight, nor have I broken up every single responsibility in such small chunks that it becomes tedious.  Instead, I have broken up the two core responsibilities of AnagramsFinder which means I can change and test the two dependencies independent of each other and from their manager class.  Even though I haven’t written a ton of code yet, I feel the tests specs have driven my design into a cohesive, decoupled design that should prove flexible if requirements change or if I encounter friction later on and need to rethink my design.  A flexible architecture for the application is being flushed out based on a “top-to-bottom” design view of the requirements.

Create Stubs or Assert Methods were Called

Tobias Walter left a comment on my last post with some suggestions that I would like to get clarification on and discuss further with my growing audience of three now.  He compared some of my implementation to Dave’s implementation on his Favour test driving logic over data post, but the comparison was lost on me.  It seemed Tobias was saying that I had gone too far breaking up the responsibilities into such small dependencies.  I am unsure what his argument was, but as far as I could tell both Dave and I broke up a managing class into two separate responsibilities so that the more crucial of the two could be tested independently.  Also in doing so, we both created stubs that were given specific dummy responses to method calls using Rhino.Mocks.  One more comment I would like to say is that I broke the file parsing responsibility away into its own dependency more because of the need to isolate the anagrams grouping than because file parsing is of itself a difficult task.  If Tobias or anyone else could try to clarify his argument, that would really help me understand the merits of his suggestions.

Another suggestion that Tobias brought up that I would like to petition feedback on is to use the mocking feature of testing whether a method was called instead of creating a stub programmed with a dummy response.  I have to admit, I’m not sure when to use one method over another.  I guess I’ve felt in the past that testing whether a method was called seems silly, since it seems odd to have your test know the implementation and inner method calls of the subject under test.  But I am now realizing that my stubs appear to be doing the same thing, but with dummy input and output programmed in.  If you have an opinion or experience on this subject, please leave some feedback in the comments of this post.

What’s Next?

In my next post I hope to summarize this mocking debate brought up above regarding two differing methods of verifying my manager class is correctly managing its dependencies.  I also hope to begin driving out the design and implementation of the AnagramGrouper dependency via behavior specification tests.  Thanks and any feedback is greatly appreciated.

Comments

David
Hi Mike,

In terms of the stubbing vs. asserting a method was called, the team I'm on has found we rarely AssertWasCalled any more. Doing so explicitly couples our tests/specs to the implementation, whereas just stubbing calls in a SetUp seems easier to change without altering the fundamental assertions in our tests.

The only time we use AssertWasCalled is for void methods to a service or similar where there is no observable side-effect from a unit perspective.

YMMV. :)

Regards,
David
Tobias Walter
Hi Mike,

great you continued this series.

I think I messed things up and wrote incorrect things. In the meantime I managed to do this kata again and now I agree with you. Just forget my advice to mock it - this was wrong.

However, I and my coworkers question if it's worth to write 30 lines of code to test a 2-lines-method? I think for this kata it will be enough to do some acceptance tests here.

Anagram Code Kata Part 2 – Mocking and SRP

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we will rewrite our first specification/test from the previous post, where we didn’t feel very confident about the direction we were heading.  We got some great feedback and will now better focus on test driving logic and not just data; we will also be more careful in extracting the requirements from problem statement and in giving our classes a Single Responsibility (the first of the SOLID principles of Object-Oriented Design).  As we are test driving our design from a “top-down” perspective, we will encounter the need for dependencies not yet implemented while creating the higher level classes.  Instead of halting our rhythm and progress, we will utilize a mocking framework to stub out non-existent implementations according to interfaces.  This also encourages us to use Dependency Inversion (last of the SOLID principles; same link as above) and we can prove the validity of our modules by testing them in complete isolation from their dependencies.

Don’t Focus on Data (Yet) and SRP (Single Responsibility Principle)

I had a feeling I was focusing too much on testing data, and I also knew it seemed to odd to be writing my very first test to be dependent on solving a specific instance of the problem (especially a test case with thousands of words in it), instead of working on solving the problem generally.  After some terrific feedback from David Tchepak in the comments of my last post, I now understand that wasn’t my only mistake.  I was also trying to give my AnagramsFinder class too many responsibilities (especially with just one ParseTextFile method), namely extracting the words out of the text file and then grouping the anagrams together.  Not only that, I also was returning a count of anagram sets when the problem statement specifically asked for output of the anagram sets themselves.  Even though this is a simple problem scenario, it can be foolish to make assumptions or simplifications like I did that are not part of the requirements.

Let us first focus on the responsibilities required of our AnagramsFinder class.  It needs to parse a list of words out of a text file given the file path and it also needs to group words together into sets that are anagrams of each other.  However, we just named two responsibilities; implementing both in the same class would make it less cohesive and therefore less maintainable.  This is because our class would have more than one responsibility that could change in the future, and because these responsibilities are in the same class, they could be considered coupled and one could be affected by modifications to the other.  All of this train of thought falls under the Single Responsibility Principle from the SOLID principles (see links at the beginning of the post).

To solve this predicament, we will make AnagramsFinder a managing class of dependencies that individually solve these separate concerns.  Each of these dependencies will adhere to the Single Responsibility principle, as does this higher abstraction manager class.  Its responsibility can be summarized as managing the inputs, outputs, and execution order of its dependencies.  We shall name the dependencies directly after each of their responsibilities, namely FileParser and AnagramGrouper.  However, I don’t want to go implement these dependencies and throw off the flow of fleshing out the design and logic of my manager class.  We are trying to design from a more “top-down” approach, instead of focusing on the lower level data concerns of the solution too early in the design process.  To accomplish completing the design of our manager class without actually implementing the dependencies, we will code to interfaces (namely IFileParser and IAnagramGrouper).

DIP (Dependency Inversion Principle)

The advantages of using interfaces are that we abstract out the actual implementation of the dependencies, allowing us to be more modular and less coupled between dependencies.  One class is not strongly tied to a specific implementation of its dependency, but rather the general contracts made available by the interface’s defined method signatures.  What this really means is that we can swap out actual implementations of the dependency without the consuming class being modified in the least bit.  This makes for truly maintainable code, especially when requirements change down the road after version one of the application is up and running.  This is what the Dependency Inversion Principle (again, from SOLID) is all about.

We can take this a step further by passing the necessary dependencies into our consuming class at construction.  This allows for us to “inject” into the object any implementation of the interface we see fit; this technique is aptly named Dependency Injection (DI).  Our manager class doesn’t have to fret about any concrete details regarding its dependencies’ actual implementations, nor about any of the construction ceremony associated with “newing” up and initializing said dependencies.  We won’t be using any DI containers (as our dependency needs are very light in this exercise), but the use of the Dependency Inversion Principle does set us up nicely to use a mocking framework so that we can finish fleshing out our tests without implementing any of the dependencies yet.

Mocking Dependencies

We would like to finish the design of our top level AnagramsFinder manager class, but not fully commit to how its dependencies will be implemented yet.  We have a general idea of how we want our consuming class to interact with its dependencies via interfaces.  Let’s go ahead and look at our interfaces we named earlier:

public interface IFileParser
{
    IEnumerable<string> ExtractWordListFromFile(string filePath);
}

public interface IAnagramGrouper
{
    IEnumerable<string[]> FindAnagramSets(IEnumerable wordList);
}

Now, we could create concrete implementations of the interfaces to be used in our tests that return simple dummy results.  An example could look like this:

public class TestFileParser : IFileParser
{
    public IEnumerable<string> ExtractWordListFromFile(string filePath)
    {
        return new[] { "wordA", "wordB", "Aword", "wordC" };;
    }
}

This is helpful because it keeps the one class we are interested in from being dependent on any real logic in its dependencies.  We want to be able to test our modules in isolation from any dependencies, so that no secondary or outside influences skew the test results.  We are basically providing dummy dependencies to our tests that have no logic that could cause side effects.

However, the software development community has provided tools called mocking frameworks that can create these dependency stubs for you.  This can save you from creating concrete interface implementations that have no use except for in testing.  The mocking framework I will try out is Rhino.Mocks from Ayende.  To get set up, you merely need to download the latest build zip archive, extract out the Rhino.Mocks.dll library to our Libraries folder created in the last blog post, and add a reference to the DLL in our Visual Studio solution.  I will next show you how to use Rhino.Mocks as we rewrite our first specification test.

Rewrite of Our First Specification/Test

Armed with SRP, DIP, mocking, a more strict adherence to the actual problem statement, and a renewed focus on “favoring test driving logic over just testing data,” we now rewrite our first specification as follows:

[Subject(typeof(AnagramsFinder), "Finding Anagrams")]
public class when_given_text_file_with_word_on_each_line
{
    static AnagramsFinder sut;
    static IEnumerable<string[]> result;
    static string filePath = "dummy_file_path.txt";
    static IEnumerable<string> wordListFromFile = new[] { "wordA", "wordB", "Aword", "wordC" };
    static IEnumerable<string[]> expected = new[] { new[] { "anagram", "gramana" } };

    Establish context = () =>
    {
        var fileParser = MockRepository.GenerateStub<IFileParser>();
        fileParser.Stub(x => x.ExtractWordListFromFile(filePath)).Return(wordListFromFile);

        var anagramGrouper = MockRepository.GenerateStub<IAnagramGrouper>();
        anagramGrouper.Stub(x => x.FindAnagramSets(wordListFromFile)).Return(expected);

        sut = new AnagramsFinder(fileParser, anagramGrouper);
    };

    Because of = () =>
    {
        result = sut.ExtractAnagramSets(filePath);
    };

    It should_result_in_list_of_anagram_sets = () =>
    {
        result.ShouldEqual(expected);
    };
}

I should mention a few notes about the code.  First, we need to add a using Rhino.Mocks statement at the top of our specification class file.  Second, our AnagramsFinder instance variable named sut stands for the “Subject Under Test,” a mannerism I picked up from David Tchepak, whom I’ve mentioned several times throughout this post series.

Our use of Rhino.Mocks is found when we call the MockRepository.GenerateStub method against an interface.  We then proceed to tell the stubbed object how to behave by specifying dummy return values when given methods of the object are called with given parameters.  The last line of our “context” setup is to then inject these two newly generated, stubbed dependencies into our manager class for testing.  It’s also interesting to note that none of my parameters and expected outputs really make much sense.  This is done on purpose to show that the data really doesn’t matter to this class, as we are not testing the logic of any data processing by the dependencies (remember, test in isolation).  It is true that this test really isn’t testing any real logic at this point, and it may even never evolve into testing any real logic either.  However, I think the important point is that it aided us in fleshing out the design, which we previously didn’t know how we were going to implement.  Not all tests will created equal in regard to validating our logic, but they all will play a part in driving the design of our code.  Hopefully these statements are true, and I would love to hear feedback on this topic in the comments.

I also want to touch on the Specification pattern of Behavior Driven Design and testing.  Specification tests are commonly set up into a workflow of context establishing, behavior performing, and results asserting.  The MSpec framework is designed to encourage this test organization.

Red, Green, Refactor and MSpec Test Runner Output

To make our first test run, we need to create the basic outline of our AnagramsFinder class as outlined in our test.  This includes a constructor that takes our two dependencies and an ExtractAnagramSets method:

public class AnagramsFinder
{
    private IFileParser fileParser;
    private IAnagramGrouper anagramGrouper;

    public AnagramsFinder(IFileParser fileParser, IAnagramGrouper anagramGrouper)
    {
        this.fileParser = fileParser;
        this.anagramGrouper = anagramGrouper;
    }

    public IEnumerable<string[]> ExtractAnagramSets(string filePath)
    {
        throw new NotImplementedException();
    }
}

I am using the ConsoleRunner according to the method described in Rob Conery’s introductory MSpec and BDD post, including output to an HTML report.  After running our specifications, we get the following output:

Specs in AnagramCodeKata:

AnagramsFinder Finding Anagrams, when given text file with word on each line
¯ should result in list of anagram sets (FAIL)
System.NotImplementedException: The method or operation is not implemented.
   ... Stack Trace here ...

Contexts: 1, Specifications: 1
  0 passed, 1 failed

…and the HTML report:

htmlReportError

A highly encouraged tenet of Test Driven Development is the practice of “Red, Green, Refactor.”  It is meant to denote the evolution of the results and state of your tests.  You are encouraged to write the test and then do the minimum work necessary to get the code base to compile and run.  You are first running your test in a control state where you know it should fail.  Most test runners will show the color Red in regard to failed tests, and thus the name of the first step.  Your next stage is to implement the code to make the test pass and turn the output color to Green, commonly indicating passing tests.  The Refactor stage is a time to pause and see if any code can be reorganized or simplified.  Then “rinse and repeat as necessary”, an applicable instruction from shampoo bottles.

To make our test pass, we implement real logic to coordinate the calls into our dependencies, like so:

public IEnumerable<string[]> ExtractAnagramSets(string filePath)
{
    var wordList = fileParser.ExtractWordListFromFile(filePath);
    return anagramGrouper.FindAnagramSets(wordList);
}

…and the successful outputs of our test runner:

Specs in AnagramCodeKata:

AnagramsFinder Finding Anagrams, when given text file with word on each line
¯ should result in list of anagram sets

Contexts: 1, Specifications: 1

 

htmlReportSuccess

Final Thoughts and Questions

So what do you think of the new test?  Are we better able to see the Single Responsibility of the AnagramsFinder manager class?  Are we more inline with solving the actual problem statement?  Your feedback would be much appreciated.

In the next post, we will begin work on test driving the design and implementation of the dependencies…unless of course I get feedback that I should be focusing on a different area first or that this last rewrite still needs more help.

Comments

Tobias Walter
Yes, this is a good start for a great idea (code kata done slowly).

Right now, I'm struggling with the same kata and I have a problem looking at your AnagramsFinder and IAnagramGrouper. The result of both is the same and the tests (apart from setting up a not needed IFileParser for the AnagramsFinder) will be the same too.

I think this is because you're using the pipes and filters-pattern (opposite to David, who combines his both results in "focus on test driving logic"). Testing just the filters should be sufficient here?

What you maybe can do is a behavioural test. So, instead of testing with stubs, use mock-objects testing weather the methods were called.

What do you think about that?

p.s. it's hard to post a comment here because the word verification-textbox is only accessible when selecting text with the mouse and scrolling down :(
David
Looks good Mike!

I wouldn't call the previous spec a mistake or "foolish" though. I seen working on simple cases like Count advocated lots of places for doing TDD, I've just always struggled with getting from those cases to the main responsibilities of the class.

Looking forward to the rest of the series.