Anagram Code Kata Part 1 – Getting Started
This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles. Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.
This post will provide resources that serve as an introduction to the MSpec BDD testing framework. We will also set up our Visual Studio solution, explain the Anagram Code Kata problem statement, and write our first specification and test.
Machine.Specifications (MSpec)
The author of this BDD testing framework is Aaron Jensen of CodeBetter.com. Probably the best introduction to the framework (and lightly touching on general principles of BDD) is Rob Conery’s post entitled Make BDD Your BFF. MSpec is hosted on GitHub using the Git source control software (and here is the link for the repository again). Download the code and build it to get the necessary libraries that you need into a Bin folder. My MSpec libraries are a few months old, as I haven’t upgraded my source since my initial download back then. I have not set up TestDriven.NET and am not sure it will be necessary for this simple coding exercise. I am using NUnit for the actual testing framework underneath MSpec; the following article may be of interest to you if you would like to use a newer version of NUnit than what MSpec targets:
Using Latest NUnit Version with MSpec
MSpec helps you organize your code into Specifications, a core BDD concept. It is inspired by early BDD testing frameworks like Ruby’s RSpec. The syntax and arrangement can be a little odd for a static language like C#, but I think I like the direction it is heading. Again, the best way to get familiar with the syntax, set up the MSpec Test Runner, and configure HTML reporting output is to read Rob’s introductory post on BDD and MSpec.
Setting Up the Visual Studio Solution
Let’s begin by creating a C# console application named AnagramCodeKata targeting .NET Framework 3.5. The solution name (and solution folder) will be the same. For our dependencies on MSpec (and NUnit), I have created a physical file system folder named Libraries and made it a sibling folder to the project’s physical folder (same level as the solution file). That folder contains the following DLL libraries from the Bin folder of the compiled MSpec source code:
- CommandLine.dll
- Machine.Specifications.ConsoleRunner.exe
- Machine.Specifications.dll
- Machine.Specifications.NUnit.dll
- Machine.Specifications.Reporting.dll
- nunit.framework.dll
Then within our Solution Explorer, we only need to make references to 3 of these libraries (namely Machine.Specifications.dll, Machine.Specifications.NUnit.dll, and nunit.framework.dll). I have also created two folders within our project in Solution Explorer, naming them Reports and Specifications. The Reports folder will contain the HTML-generated report of running our tests every time. The Specifications folder will hold our specifications and tests. This is what our current Solution Explorer looks like now:
Also notice the MSpecRunner button at the top, which I’ve created to execute a custom External Tool that points to Machine.Specifications.ConsoleRunner.exe and passes in the appropriate parameters. This helpful setup was put forth by Rob Conery in the link to his introductory blog post that I provided above.
Anagram Problem Statement
So let’s go ahead and look at the Anagram problem we are trying to solve. Here’s a quick summary of what we are trying to accomplish:
“The challenge is fairly simple: given a file containing one word per line, print out all the combinations of words that are anagrams; each line in the output contains all the words from the input that are anagrams of each other.”
Unfortunately, Dave Thomas’ wordlist.txt link is broken on his blog post, and when responding to commenters asking for a fixed link to the text file, he pointed us all to some website that aggregates links to several word lists from around the Web. Many of us readers wanted to know the exact word list file he used so that we could compare our results to his. In his post, he states that there should be 2,530 sets of anagrams and 5,680 total words participating in those anagram sets. The tough part about this is that we’ve got to have the same word list text file. I searched the web and found a few, but I only found one that might be the same list. It is from some guy’s personal GitHub repository:
http://github.com/krist0ff/code_kata/tree/ba975c3a64f11a81db2f3716b40de046f1ca7ef4/kata6
If you’re visiting this blog post, months or years after it was originally written, I obviously can’t guarantee this file will still be out there. You can try emailing me or commenting here and I can hopefully send it to you via email if I still have it. But more importantly, as we progress with this series, we’ll find out if it’s even the right word list or not.
Lastly, I’d like to look at the objectives for the coding kata that Dave Thomas outlines:
“Apart from having some fun with words, this kata should make you think somewhat about algorithms. The simplest algorithms to find all the anagram combinations may take inordinate amounts of time to do the job. Working though alternatives should help bring the time down by orders of magnitude. To give you a possible point of comparison, I hacked a solution together in 25 lines of Ruby. It runs on the word list from my web site in 1.5s on a 1GHz PPC. It’s also an interesting exercise in testing: can you write unit tests to verify that your code is working correctly before setting it to work on the full dictionary.”
I don’t think we’ll focus on runtime as much, but we’ll certainly revisit our algorithm as necessary if we have unacceptable wait times for the code to finish. To complete this section, let’s go ahead and add the wordlist.txt to our solution at the root of of our Project node.
How About Some Actual Code?!
So let’s now start off by creating our first specification. I have created a new C# file named AnagramsFinderSpecs.cs under our Specifications folder in our solution. I attempted to write our first test, but already feel like I’m not inline with “favoring test driving logic over just testing data.”
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Machine.Specifications; namespace AnagramCodeKata.Specifications { [Subject(typeof(AnagramsFinder), "Finding Anagrams")] public class when_given_text_file_with_word_on_each_line { static int result; static AnagramsFinder sut; Establish context = () => { sut = new AnagramsFinder(); }; Because of = () => { result = sut.ParseTextFile("wordslist.txt"); }; It should_result_in_count_of_2530_anagram_sets = () => { result.ShouldEqual(2530); }; } }
If you can’t make out the funky structure and syntax, again refer back to Rob Conery’s introductory post to all of this stuff that I link to at the top. He does a great job of explaining what’s going on here.
Also, please remember that I have no clue what I’m doing and I’m just trying to copy people who know what they are doing. I am hoping that I’ll be corrected by those who have a better idea of how to approach all of this. This is a good stopping point because of my desire for feedback and the feeling of uncertainty I’m already getting.
Seriously, please guide me in the right direction or validate where I’m headed, because even I’m not sure where that is yet. I will say this, I’m pretty sure I shouldn’t be reading in wordslist.txt yet, and that this might be where I need to start using a mocking framework.
Stay tuned, there’s more to come. I just can’t tell what it will be about yet. It’ll still be fun though, I promise.
Like you, I'm a bit wary of that first spec you've written (that doesn't mean it's wrong obviously – I'm new to this too :)).
You're right that you are testing data – loading the wordslist.txt file and checking for the right number of anagram combinations. By the way, this is probably a good example of an integration test for when you are done! :)
Whenever I have a hard problem like this that I am having trouble solving, I try and cheat. If you keep this class simple, but push the tricky bits lower down the heirarchy then that may help you start.
Here's how I would update the current spec to help implement your test (this won't be a spoiler because I'd change the test. I'll go into that later).
In this case I can think of two responsibilities: getting the words from our file, and then counting the number of anagrams found in those words. So maybe your spec's context could stub out a IFileReader or similar which will return an IEnumerable<string> (filled with some fake data) when asked to get the lines in wordslist.txt. You could then give these words to an IAnagramCounter, which will return 2530.
Your test and because can remain the same – only the context will change (which is what I love about this style of writing specs).
Of course this isn't specifying what your AnagramFinder should actually do… the kata aim is to print all the anagram combinations to the screen. So you may be better off trying this approach of pushing down logic but rewrite the spec to match the exact problem you are trying to solve.
I'm not sure if this will help or lead you down a dead end, but it is the approach I would take.
Hope it helps!
David