Anagram Code Kata Part 5 – Domain Objects Over Language Primitives
This post is part of a series on coding Kata, BDD, MSpec, and SOLID principles. Feel free to visit the above link which points to the introductory post, also containing an index of all posts in the series.
In this post, we will discuss some reasons why you might want to avoid using language primitives directly in place of domain objects. Specifically, I have been using String variables and objects to represent words up to this point. Esteban suggested I create a Word class for a few good reasons which I’ll layout for you. Out of sheer luck and good timing, Dru Sellers of CodeBetter.com also wrote on this subject shortly afterward and confirmed Esteban’s reasoning.
Use String Class or Create Word Domain Object?
I mentioned in the previous post that Esteban had suggested creating a Word class instead of just passing around strings everywhere through my application. One good reason is that we don’t own the String class and so further modification to our string-based implementation is difficult because the logic is scattered throughout the application, instead of centralized under the responsibility and definition of one domain class. It is much more difficult to refactor the word-specific logic with its behaviors and attributes spread throughout the code. Even if your Word class doesn’t grow any more beyond a seemingly unnecessary wrapper of the String class, the code is more cohesive and ready for change should the need arise.
But more importantly, you designed the code thinking in a true object-oriented mindset. You have to keep in mind that the String class is someone else’s implementation, and is hardly ever sufficient in and of itself as a domain object within your solution. Think about it, the behaviors that a string object performs are so generic and multipurpose that you likely don’t need two-thirds of the class as defined (and there’s likely a few behaviors you really need that aren’t there). Of course nearly every application on earth makes use of the String class; but because of this fact, it has no meaning in and of itself within any given application. You would have to look at how the stings are actually used within the code in order to understand its unique application within the app’s context. Of course that task of research is much easier for everyone (including the original author of the code) if it’s all encapsulated within a dedicated domain object class. True object-orientation means describing in code form the properties, behaviors, and interactions/relationships of real world objects within your problem domain.
Esteban gave me a great example to illustrate these points. He said that you can always represent money as a decimal (and even when you use a domain object, it’s got to have a decimal language primitive underneath the covers). However, what happens when you need to attach metadata to the amount (like currency denomination), or if you need to change decimal precision? You would have to go through all of the code and make sure your use of decimal language primitives is modified uniformly in order to retain consistency. Also, mathematic operations involving money are hardly ever the same as their counterparts involving standard decimals, because currency deals with discrete values to a certain decimal precision. Typically when the behaviors and properties within our system begin to get complex, we are cognizant enough to create domain objects in order to bring it all under one class. We definitely don’t want to over-architect features and interactions before we need them, but I think there is power in this principle of abstracting away language primitives and instead encapsulating their use within domain objects located in just one place in your codebase. I believe it is one thing we can keep in mind to help guide us to better object-oriented thinking, and avoid language-oriented coding.
As mentioned above, Esteban’s thoughts were confirmed nearly verbatim by Dru Sellers of CodeBetter.com in his blog post that he wrote just a few days after I had the conversation with Esteban. A great coincidence no doubt, and worth a read; here’s the link:
Word Class Implementation
So basically I created the following class implementation and then replaced string with a reference to this new class, Word:
public class Word { private string wordStr = string.Empty; public Word(string wordStr) { this.wordStr = wordStr; } public override string ToString() { return wordStr; } public override bool Equals(object obj) { return wordStr == ((Word)obj).ToString(); } public override int GetHashCode() { return base.GetHashCode(); } public int GetCanonicalHashCode() { char[] letters = wordStr.ToCharArray(); Array.Sort<char>(letters); return new string(letters).GetHashCode(); } }
I have defined an overridden implementation for Equals(object) (so that the test assertions and other IEnumerable.Contains() queries work) and GetHashCode() (solely to satisfy a compiler warning). I also moved the GetCanoncialHashCode() method from AnagramGrouper, in order to better encapsulate it as a behavior a Word knows how to do innately.
One other change I had to make was to convert strings into Word objects in our NewlineFileParser, which I accomplished by using an IEnumberable.Select() call as shown below:
return File.ReadAllLines(filePath).Select<string, Word>(x => new Word(x));
Let’s Revisit AssertWasCalled One More Time
Trust me, I am groaning with you, even as I wrote that heading text. The last thing we need is for me to rehash the topic again and flip flop my stance yet another time. Yes, that’s right I’ve changed my mind again. First I couldn’t understand what utility asserting methods were called would have under normal test scenarios. Then I changed my mind that perhaps using it would help my test specifications have clearer intent of what I am asserting. After an email conversation with David Tchepak, I think I’m now back to my original stance. Here is what Dave said that had me reconsidering, and I think it’s pretty sound reasoning (emphasis added by me):
…
“But seeing as you asked for it, here goes. :)
“I try and avoid AssertWasCalled like the plague. Generally I don’t care that some method was called, I care that my class does the work it needs to. If that involves calling a dependency then great, but that is not my class’ reason for existence.
“I prefer your original approach of stubbing out everything in the setup and having that tested indirectly. One reason I prefer this is I find it makes it easier to refactor: the assertions in my test don’t change, the class still does the same thing. However I can add or change dependencies by changing some wiring in the setup, and then make sure I haven’t stuffed anything up as my assertions still pass. I prefer that my tests specify what I want, not how it does it. To me, AssertWasCalled reeks of over-specification. The one exception is where I hit the external boundaries of my code, so where I want to send an email or something without side effects that I can test. Then the core of the behaviour is the call itself, so I’m happy to assert on that then.”
And with that I’ll promise to never bring this up again…unless of course I get swayed by someone else. :) In all seriousness, I think this is an interesting discussion and so if you have insight, please share via comment below.
Summary
Please leave your thoughts in the comments below in regard to creating domain objects over using language primitives (or heaven forbid, the AssertWasCalled debate). As far as this coding kata exercise, I need to take a high-level look at where this should go next. Perhaps the next post will tie up loose ends and see how our code performs on large text files as input. It may be that we will need to refactor our architecture to achieve better runtimes. If not, maybe we can still discuss where would could have headed if it had been necessary.