Friday, April 20, 2012

Refactoring: Improving the Design of Existing Code, by Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts

Book cover It was long overdue for me to read a technical book and I've decided to go for a classic from 1999 about refactoring, written by software development icons as Martin Fowler and Kent Beck. As such, it is not a surprise that Refactoring: Improving the Design of Existing Code feels a little dated. However, not as much as I had expected. You see, the book is trying to familiarize the user with the idea of refactoring, something programmers of these days don't need. In 1999, though, that was a breakthrough concept and it needed not only explained, but lobbied. At the same time, the issues they describe regarding the process of refactoring, starting from the mechanics to the obstacles, feel as recent as today. Who didn't try to convince their managers to allow them a bit of refactoring time in order to improve the quality and readability of code, only to be met with the always pleasant "And what improvement would the client see?" or "are there ANY risks involved?" ?

The refactoring book starts by explaining what refactoring means, from the noun, which means the individual move, like Extract Method, to the verb, which represents the process of improving the readability and quality of the code base without changing functionality. To the defense of the managerial point of view, somewhere at the end of the book, authors submit that big refactoring cycles are usually a recipe for disaster, instead preaching for small, testable refactorings on the areas you are working on: clean the code before you add functionality. Refactoring is also promoting software testing. One cannot be confident they did not introduce bugs when they refactor if the functionality is not covered by automated or at least manual tests. One of the most important tenets of the book is that you write code for other programmers (or for yourself), not for the computer. Development speed comes from quickly grasping the intention and implementation when reading, maintaining and changing a bit of code. Refactoring is the process that improves the readability of code. Machines go faster no matter how you write the code, as long as it works.

The book is first describing and advocating refactoring, then presenting the various refactoring moves, in a sort of structured way, akin to the software patterns that Martin Fowler also attempted to catalog, then having a few chapter written by the other authors, with their own view of things. It can be used as a reference, I guess, even if Fowler's site does a better job at that. Also, it is an interesting read, even if, overall, it felt to me like a rehearsal of my own ideas on the subject. Many of the refactorings in the catalog are now automated in IDEs, but the more complex ones have not only the mechanics explained, but the reasons for why they should be used and where. That structured way of describing them might feel like repeating the obvious, but I bet if asked you couldn't come up with a conscious description of the place a specific refactoring should be used. Also, while reading those specific bits, I kept fantasizing about an automated tool that could suggest refactorings, maybe using FxCop or something like that.

Things I've marked down from the book, in the order I wrote them down in:
  • Refactoring versus Optimization - Optimizing the performance or improving some functionality should not be mixed up with the refactoring of code, which aims to improve readability of code while preserving the initial functionality. Mixing them up is pitting the two essential stages of development one against the other.
  • Methods should use their data of their own object - one of the telltales of need to refactor is when methods from an object use data from another object. It smells like the method should be moved in the responsibility of that other object.
  • When it is easy to refactor, choose a simple design - Of course the opposite is true, as well: when you know it will be hard to refactor a piece of code, try to design it first. If not, it is better to not add unnecessary complexity. This is in line with the KISS concept.
  • Split your application into self encapsulated parts - One of the ways to simplify refactoring is to separate your application into bits that you can manage separately. If you didn't design your application like that, try to first split it, then refactor.
  • Whenever you need to write a comment, consider extracting a method with a meaningful name - or renaming methods to be more expressive.
  • Consider polymorphism when seeing a switch statement - Now that is an interesting topic in itself. Why would polymorphism help here? How could it be simpler to understand than a switch/case statement? The idea behind this is that if you have a switch somewhere, you might have it somewhere else as well. Instead of taking decisions inside each method, it is better to split that behaviour in separate classes, each describing the particular value that the switch would have operated on.
  • Test before refactoring - this would have been drilled in your head already, but if not, the book will do that to you. In order to not add faults to the program with the refactoring, make sure you have tests for the existing functionality, tests that should pass after the refactoring process, as well.
  • The Quantity pattern - Review the Quantity pattern in order to improve readability and encapsulate simple common actions performed on specific types of units.
  • Split conditionals into methods - in other words try to simplify your conditional blocks to if conditionMethod() then ifMethod() else elseMethod(). It might seem a sure way to get to a fragmented code base, with small methods everywhere, but the idea is sound. A condition, after all, is an intention. Encapsulate it into a well named method and it will be very clear what the programmer intended. Maybe the same method will be used in other places as well, and then, using polymorphism, one can get rid of the conditional altogether.
  • Use Null objects - an interesting concept that I haven't even considered before. It is easy to recognize the need for a Null object when there are a lot of checks for null. if x==null then something() else x.somethingElse() would be turned into a simple x.something() if instead of null, x would be an object that represents empty, but still has attached behavior. An interesting side effect of this is that often the Null object can be made an immutable singleton.
  • Code inside Assertions always executes - This is a gotcha I found interesting. Imagine the following code: Assert.IsTrue(SomeCondition()) Even if the Assert object is designed to not execute anything in Release mode, only compiled in Debug, the method SomeCondition() will execute all the time. One option is to use an extra condition: Assert.IsTrue(Assert.On&&SomeCondition()) or, in C#, try to send an expression: Assert.IsTrue(()=>SomeCondition())
  • Careful when replacing method parameters with parameter object in parallel processing scenarios - Which nowadays means always. Anyways, the idea is that old libraries designed for parallel processing used large value parameter lists. One might be inclined to Introduce Parameter Object, but that introduces a reference object that might lead to locking issues. Just another gotcha.
  • Separate Modifier from Query - This is a useful convention to remember. A method should either get some information (query) or change some data (modifier), not both. It makes the intention clear.

That's about it. I have wet dreams of cleaning up the code base I am working on right now, maybe in a pair programming way (also a suggestion in the book and a situation when pair programming really seems a great opportunity), but I don't have the time. Maybe this summary of the book will inspire others who have it.


Andrei Rinea said...

Excellent article!

Regarding "Separate Modifier from Query" I have found it 'in the wild' under the name CQRS.

Siderite said...

Thank you, Andrei! The names of the refactorings are clearly not used out there. Even the simple automated ones that ReSharper does have different names in Fowler's book and site. That doesn't stop them from being instantly recognizable when explained.