Anagram Code Kata Part 4 – Will It Write My Code for Me?

This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles.  Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.

In this post, we work on implementing the other main dependency, IAnagramsFinder.  In doing so, you will discover that I have been a little naïve with this BDD methodology. Apparently I expected a different experience than what I ran into doing the exercise.  In gathering feedback, I had a very interesting discussion about Object-Oriented Programming with Esteban that I will summarize and ask for further discussion from all of you.  I’ll also show you one way to debug your code when running the MSpec tests via the console test runner.

New Thoughts on Mocks vs. Stubs

During Part 2 of this series, Tobias left a comment about using mocks and AssertWasCalled, rather than stubs and defining fake responses.  I didn’t understand how it would help, as it appeared they would accomplish roughly the same thing.  With stubs and fake responses, if you don’t get back the return you are looking for, then you can infer that the dependencies weren’t called since you had previously hard-coded their return values when called.  With mocks and asserting methods were called, you directly assert that the manager class did in fact call it’s dependencies.

When I looked more closely into using the AssertWasCalled method in Rhino.Mocks, I noticed that my strategy of using stubs and faked return values wasn’t very explicit in signifying that the ultimate responsibility of the manager class was to delegate to its dependencies.  I mean the test context does rigorously stub out the dependencies, but the actual assertions weren’t clear what we were testing.  Therefore, I decided to pop in a pair of AssertWasCalled statements (as found below), leaving the rest of the test the same as it was back in Part 2.

It should_result_in_list_of_anagram_sets_and_both_dependencies_should_have_been_called = () =>
{
    fileParser.AssertWasCalled(x => x.ExtractWordListFromFile(filePath));
    anagramGrouper.AssertWasCalled(x => x.FindAnagramSets(wordListFromFile));

    result.ShouldEqual(expected);
};

However, when I thought to simplify the test code by removing most of stubbing setup, it appeared I had to leave most of the stub logic in tact in order to test if anagramGrouper.FindAnagramSets() was called with parameter wordListFromFile, which was one of the stubbed fake responses.  The one dependency (the file parser) needed to pass it’s return value to the next dependency (the anagrams finder).  I could not think of a way to accomplish testing that the manager class facilitated that without the stubbing logic (fake responses) that I created.

It would be great if someone could enlighten me since I’m such a newbie to mocking and stubbing.  Otherwise, I feel the intent of the test is now more clear with the AssertWasCalled checks that I added, even if that I means I’m needlessly mixing stubbing and mocking.  But one last question:  should I try to stick more closely to the rule of thumb guiding me to stick to one all-encompassing assertion per test if possible?  Let me know your thoughts.

Implementing the AnagramGrouper Dependency

So let’s continue onward with some more actual design and code.  We are tackling the AnagramGrouper dependency, whose sole responsibility is to take in the parsed collection of words and find anagram sets.  Here is the test spec (one thing to note is that in some of my earlier tests for the AnagramsFinder manager class and IFileParser, I had used string arrays liberally; I decided to go with IList<string> collections instead):

public class AnagramGrouperSpecs
{
    [Subject(typeof(AnagramGrouperSpecs))]
    public class when_given_word_list_collection
    {
        static AnagramGrouper sut;
        static IEnumerable<IList<string>> result;
        static IEnumerable<string> wordList = new[] { "wordA", "wordB", "Aword", "wordC" };

        Establish context = () =>
        {
            sut = new AnagramGrouper();
        };

        Because of = () =>
        {
            result = sut.FindAnagramSets(wordList);
        };

        It should_result_in_anagram_sets_collection_of_length_1 = () =>
        {
            result.Count().ShouldEqual(1);
        };

        It should_contain_2_specific_words_in_the_anagram_set = () =>
        {
            result.First().ShouldContain("wordA", "Aword");
        };
    }
}

Here too I have basically two assertions (but split it up differently than last time); should I also try harder to avoid this as well?

When contemplating how to further breakdown this responsibility of finding anagram sets into subtasks, I figure we would need one piece of code that would compare two words and answer whether they are anagrams of each other, and another piece of code that would keep track of the various anagram sets (perhaps a collection or dictionary of string arrays).  With what BDD has revealed to me thus far, I figure this means we are going to see two more dependencies (and interfaces, and spec tests) added to the codebase.

So I start with the word comparison task, thinking I could perhaps leverage the String.GetHashCode() method for assigning meaningful value to the list of characters in the string.  However, string hash codes give importance to letter casing and order of characters, so a modified strategy would have to be utilized, even though GetHashCode() seems quite close to filling our need.

Now I probably should have come up with this ingenious algorithm all by myself, but…it is what it is and this is the best, most concise solution that I can put together to solve the problem.  With that said, I ended up doing some internet searching in order to proof-of-concept my GetHashCode() idea.  I came across pure genius in the form of Brian Mullen’s blog post entitled LINQ group by and GroupBy, specifically the second-to-last code snippet.  Just String.ToCharArray() the word and then Array.Sort() the resultant array; now turn it back into a string and do GetHashCode() now.  (As a side note, we can worry about case-insensitivity later if desired by simply doing a String.ToLower() on the word before getting the hash code.)  Now anagrams will have the same hash code (Brian uses the term “canonical”, which I think is very fitting); it’s literally perfect in every way!

So now for the other dependency, I figure we could use a Dictionary<int, IList<string>>, where the key is the canonical hash code and the value is a string list collection representing an anagram set of words.  These two dependency implementations are so simple and straight forward that I have a hard time seeing a reason to continue breaking out full-fledged object dependencies with interfaces and Dependency Injection.  However, I will make a case for moving the canonical hash code logic outside this AnagramsGrouper class, but that will come later.

So let’s write some code to pass our test/spec:

public class AnagramGrouper : IAnagramGrouper
{
    public IEnumerable<IList<string>> FindAnagramSets(IEnumerable<string> wordList)
    {
        Dictionary<int, IList<string>> results = new Dictionary<int, IList<string>>();

        foreach (string word in wordList)
        {
            int canonicalHashCode = GetCanonicalHashCode(word);

            if (results.ContainsKey(canonicalHashCode))
            {
                results[canonicalHashCode].Add(word);
            }
            else
            {
                results.Add(canonicalHashCode, new List<string>(new[] { word }));
            }
        }

        return results.Values;
    }

    public static int GetCanonicalHashCode(string word)
    {
        char[] letters = word.ToCharArray();
        Array.Sort<char>(letters);

        return new string(letters).GetHashCode();
    }
}

Maybe you noticed already, but there is a bug in my logic.  At that time, I didn’t see it and needed to debug the code while running.  Setting a debug break point in your code doesn’t work like you’re probably used to, and this is because we’re running the MSpec tests via its console test runner.  I had to do some searching to figure this one out, and that’s what we’ll cover next.

How to Debug MSpec Specifications and the Code They Test

After a short ride on the Google super highway, I found a blog post containing a November 2009 comment from Aaron Jensen, author of MSpec, that gave a few possible solutions.  I opted to go with the Debugger.Break() solution, which I put at the very top of my FindAnagramSets method.  It is a tad awkward because it will show a Windows error dialog box that says the program encountered a problem and Windows is searching the web for a solution.  If you wait through that, it will give you an option to debug the program via a Visual Studio dialog where you select the currently running instance of Visual Studio as your debugger instance.  You just remove the statement when you’re done (and you’ll really want to remember to remove it from your production code).  Does the trick, though it is a bit obtrusive to your code base.

If you haven’t already figured it out, the bug is that my FindAnagramSets method is returning all anagram sets found in the dictionary.  An anagram set of just one word is not really an anagram set, which is why our test specification which is based on that assumption failed.  The fix is to filter out anagram word sets with less than two words before returning, like this:

// Return anagram sets with more than one word in it
return results.Values.Where(x => x.Count > 1);

That will make the code pass the tests now.  Remember to remove the debugger statement we added.

BDD, OOP…WTH?!

Just kidding.  But I do want to discuss a reaction I had to BDD after implementing this anagram finding solution.  After I completed the code and finished the tests, I was feeling pretty great.  But then I started thinking about how I didn’t really go down the path that I felt BDD was originally trying to lead me down with AnagramGrouper, which was to create two more dependencies, just as I had done for the first managing class AnagramsFinder.  I mean I felt pretty frustrated because I was digging BDD to this point.  I’m not quite sure what my original expectations were, but I might have thought BDD was magically going to right my algorithms for me as well.

I decided to discuss the whole thing with Esteban to get some perspective on the issue.  He told me he didn’t know a thing about BDD and how it’s supposed to work, but he figured BDD would lead me to decoupled, cohesive code design, but he couldn’t see how it would aide with algorithmic solutions to the problems at hand.  I’m not sure why I was so naïve, but it makes complete sense.  I mean these methodologies are designed to keep you disciplined to some proven, guiding principles, but you still need to use your brain and creativity to actually solve the problem.  I guess it would help to get some feedback and suggestions from you all in regard to knowing when it is time to switch from BDD architecture design mode to algorithm design mode within the broken down dependencies that handle the very specific sub-responsibilities.  Please let me know if you have found some guiding concept or question you’ve come up with that you put to the test in order to lead you into the right coding mindset for that particular context (design or algorithm; or do I even have the right categories?).

Perhaps it’s just that this design process feels so new to me.  Maybe I have solely been in the coding, algorithm mindset this whole time throughout my career.  Perhaps it will become more natural with practice and experience.  I guess it just feels like I’m learning to drive a manual, stick-shift car or something, and I am a little rough with the clutch when transitioning between gears.

To take this discussion further, Esteban made a comment in our conversation about his personal thoughts on these methodologies (like BDD) and Object-Oriented Programming.  He has come to believe that we invent these methodologies like flavors of the week “because as programmers we don’t really get OOP.”  Perhaps OOP just comes more naturally to his thought process that most people (or he admits it’s possible he could just be completely ignorant), but he feels that there’s really just a few simple rules and guidelines to keep in mind at all times in order to keep your objects behaving like real life objects and not acting like data constructs.  I encouraged him to give his theory some additional thought and make it a little more formal by writing it down.  He did this and the following is a link to the resultant blog post:

3 Simple Rules to Good Object Oriented Code

Please read through that post and comment on it.  His idea needs to be criticized and/or validated.  He admits he may have no clue what he’s talking about, but I said that’s why it needs to be put to the test by posting it in the open.  We both think it’s an interesting theory, but we’re unsure if it even holds any water.  Perhaps Esteban’s theory is too simplistic to be practical for many programmers.  Provide feedback and lots of it.

Summary

So I think I need to wrap this post up about now.  I think I have raised enough discussion points to write a book, but please do take the time to comment on what I’m doing wrong, what I’m doing right, or how to make the BDD mentality more flowing and natural.  I promised to talk about having a Word class as opposed to using primitive strings everywhere, and that will definitely be in the next post.  Also, I think our AnagramGrouper class may need some more testing and flushing out of how it should handle spaces, punctuation, casing, and so on.

Thanks for sticking with me thus far.