Saturday, February 25, 2017

Refactoring conditionals

Intro


I was thinking about what to discuss about computers next and I realized that I hardly ever read anything about the flow of an application. I mean, sure, when you learn to code the first thing you hear is that a program is like a set of instructions, not unlike a cooking recipe or instructions from a wife to her hapless husband when she's sending him to the market. You know, stuff like "go to the market and buy 10 eggs. No, make it 20. If they don't have eggs get salami." Of course, her software developer husband returns with 20 salamis "They didn't have eggs", he reasons. Yet a program is increasingly not a simple set of instructions neatly following each other.

So I wrote like a 5 page treatise on program control flow, mentioning Turing and the Benedict Cumberbatch movie, child labor and farm work, asking if it is possible to turn any program, with its parallelism and entire event driven complexity to a Turing machine for better debugging. It was boring, so I removed it all. Instead, I will talk about conditional statements and how to refactor them, when it is needed.

A conditional statement is one of the basic statements of programming, it is a decision that affects what the program will execute next. And we love telling computers what to do, right? Anyway, here are some ways if/then/else or switch statements are used, some bad, some good, and how to fix whatever problems we find.

Team Arrow


First of all, the arrow antipattern. It's when you have if blocks in if blocks until your code looks like an arrow pointing right:
if (data.isValid()) {
  if (data.items&&data.items.length) {
    var item=data.items[0];
    if (item) {
      if (item.isActive()) {
        console.log('Oh, great. An active item. hurray.');
      } else {
        throw "Item not active! Fatal terror!";
      }
    }
  }
}
This can simply be avoided by putting all the code in a method and inverting the if branches, like this:
if (!data.isValid()) return;
if (!data.items||!data.items.length) return;
var item=data.items[0];
if (!item) return;
if (!item.isActive()) {
  throw "Item not active! Fatal terror!";
}
console.log('Oh, great. An active item. hurray.');
See? No more arrow. And the debugging is so much easier.

There is a sister pattern of The Arrow called Speedy. OK, that's a Green Arrow joke, I have no idea how it is called, but basically, since a bunch of imbricated if blocks can be translated into a single if with a lot of conditions, the same code might have looked like this:
if (data.isValid()&&data.items&&data.items.length&&data.items[0]) {
  var item=data.items[0];
  if (!item.isActive()) {
    throw "Item not active! Fatal terror!";
  }
  console.log('Oh, great. An active item. hurray.');
}
While this doesn't look like an arrow, it is in no way a better code. In fact it is worse, since the person debugging this will have to manually check each condition to see which one failed when a bug occurred. Just remember that if it doesn't look like an arrow, just its shaft, that's worse. OK, so now I named it: The Shaft antipattern. You first heard it here!

There is also a cousin of these two pesky antipatterns, let's call it Black Shaft! OK, no more naming. Just take a look at this:
if (person&&person.department&&person.department.manager&&person.department.manager.phoneNumber) {
  call(person.department.manager.phoneNumber);
}
I can already hear a purist shouting at their monitor something like "That's because of the irresponsible use of null values in all programming languages!". Well, null is here to stay, so deal with it. The other problem is that you often don't see a better solution to something like this. You have a hierarchy of objects and any of them might be null and you are not in a position where you would cede control to another piece of code based on which object you refer to. I mean, one could refactor things like this:
if (person) {
  person.callDepartmentManager();
}
...
function callDepartmentManager() {
   if (this.department) {
      this.department.callManager();
   }
}
, which would certainly solve things, but adds a lot of extra code. In C# 6 you can do this:
var phoneNumber = person?.department?.manager?.phoneNumber;
if (phoneNumber) {
  call(phoneNumber);
}
This is great for .NET developers, but it also shows that rather than convince people to use better code practices, Microsoft decided this is a common enough problem it needed to be addressed through features in the programming language itself.

To be fair, I don't have a generic solution for this. Just be careful to use this only when you actually need it and handle any null values with grace, rather than just ignore them. Perhaps it is not the best place to call the manager in a piece of code that only has a reference to a person. Perhaps a person that doesn't seem to be in any department is a bigger problem than the fact you can't find their manager's phone number.

The Omnipresent Switch


Another smelly code example is the omnipresent switch. You see code like this:
switch(type) {
  case Types.person:
    walk();
    break;
  case Types.car:
    run();
    break;
  case Types.plane:
    fly();
    break;
}
This isn't so bad, unless it appears in a lot of places in your code. If that type variable is checked again and again and again to see which way the program should behave, then you probably can apply the Replace Conditional with Polymorphism refactoring method.



Or, in simple English, group all the code per type, then only decide in one place which of them you want to execute. Polymorphism might work, but also some careful rearranging of your code. If you think of your code like you would a story, then this is the equivalent of the annoying "meanwhile, at the Bat Cave" switch. No, I want to see what happens at the Beaver's Bend, don't fucking jump to another unrelated segment! Just try to mentally filter all switch statements and replace them with a comic book bubble written in violently zigzagging font: "Meanwhile...".

A similar thing is when you have a bool or enum parameter in a method, completely changing the behavior of that method. Maybe you should use two different methods. I mean, stuff like:
function doWork(iFeelLikeIt) {
  if (iFeelLikeIt) {
    work();
  } else {
    fuckIt();
  }
}
happens every day in life, no need to see it in code.

Optimizing in the wrong place


Let's take a more serious example:
function stats(arr,method) {
  if (!arr||!arr.length) return;
  arr.sort();
  switch (method) {
    case Methods.min:
      return arr[0];
    case Methods.max:
      return arr[arr.length-1];
    case Methods.median:
      if (arr.length%2==0) {
        return (arr[arr.length/2-1]+arr[arr.length/2])/2;
      } else {
        return arr[Math.ceiling(arr.length/2)];
      }
    case Methods.mode:
      var counts={};
      var max=-1;
      var result=-1;
      arr.forEach(function(v) {
        var count=(counts[v]||0)+1;
        if (count>max) {
          result=v;
          max=count;
        }
        counts[v]=count;
      });
      return result;
    case Methods.average:
      var sum=0;
      arr.forEach(function(v) { sum+=v; });
      return sum/arr.length;
  }
}

OK, it's still a silly example, but relatively less silly. It computes various statistical formulas from an array of values. At first, it seems like a good idea. You sort the array that works for three out of five methods, then you write the code for each, which is greatly simplified by working with a sorted array. Yet for the last two, being sorted does nothing and both of them have loops through the array. Sorting the array would definitely loop through the array as well. So, let's move the decision earlier:
function min(arr) {
  if (!arr||!arr.length) return;
  return Math.min.apply(null,arr);
}

function max(arr) {
  if (!arr||!arr.length) return;
  return Math.max.apply(null,arr);
}

function median(arr) {
  if (!arr||!arr.length) return;
  arr.sort();
  var half=Math.ceiling(arr.length/2);
  if (arr.length%2==0) {
     return (arr[half-1]+arr[half])/2;
  } else {
     return arr[half];
  }
}

function mode(arr) {
  if (!arr||!arr.length) return;
  var counts={};
  var max=-1;
  var result=-1;
  arr.forEach(function(v) {
    var count=(counts[v]||0)+1;
    if (count>max) {
      result=v;
      max=count;
    }
    counts[v]=count;
  });
  return result;
}

function average(arr) {
  if (!arr||!arr.length) return;
  return arr.reduce(function (p, c) {
      return p + c;
    }) / arr.length;
}

As you can see, I only use sorting in the median function - and it can be argued that I could do it better without sorting. The names of the functions now reflect their functionalities. The min and max functions take advantage of the native min/max functions of Javascript and other than the check for a valid array, they are one liners. More than this, it was natural to use various ways to organize my code for each method; it would have felt weird, at least for me, to use forEach and reduce and sort and for loops in the same method, even if each was in its own switch case block. Moreover, now I can find the min, max, mode or median of an array of strings, for example, while an average would make no sense, or I can refactor each function as I see fit, without caring about the functionality of the others.

Yet, you smugly point out, each method uses the same code to check for the validity of the array. Didn't you preach about DRY a blog post ago? True. One might turn that into a function, so that there is only one point of change. That's fair. I concede the point. However don't make the mistake of confusing repeating a need with repeating code. In each of the functions there is a need to check for the validity of the input data. Repeating the code for it is not only good, it's required. But good catch, reader! I wouldn't have thought about it myself.

But, you might argue, the original function was called stats. What if a manager comes and says he wants a function that calculates all statistical values for an array? Then the initial sort might make sense, but the switch doesn't. Instead, this might lead to another antipattern: using a complex function only for a small part of its execution. Something like this:
var stats=getStats(arr);
var middle=(stats.min+stats.max)/2;
In this case, we only need the minimum and maximum of an array in order to get the "middle" value, and the code looks very elegant, yet in the background it computes all the five values, a waste of resources. Is this more readable? Yes. And in some cases it is preferred to do it like that when you don't care about performance. So this is both a pattern and an antipattern, depending on what is more important to your application. It is possible (and even often encountered) to optimize too much.

The X-ifs


A mutant form of the if statement is the ternary operator. My personal preference is to use it whenever a single condition determines one value or another. I prefer if/then/else statements to actual code execution. So I like this:
function boolToNumber(b) {
  return b?1:0;
}

function exec(arr) {
  if (arr.length%2==0) {
     split(arr,arr.length/2);
  } else {
     arr.push(newValue());
  }
}
but I don't approve of this:
function exec(arr) {
  arr.length%2
    ? arr.push(newValue()) 
    : split(arr,arr.length/2);
}

var a;
if (x==1) {
  a=2;
} else {
  a=6;
}

var a=x==1
    ? 2
    : (y==2?5:6);
The idea is that the code needs to be readable, so I prefer to read it like this. It is not a "principle" to write code as above - as I said it's a personal preference, but do think of the other people trying to make heads and tails of what you wrote.

We are many, you are but one


There is a class of multiple decision flow that is hard to immediately refactor. I've talked about if statements that do the entire work in one of their blocks and of switch statements that can be easily split into methods. However there is the case where you want to do things based on the values of multiple variables, something like this:
if (x==1) {
  if (y==1) {
   console.log('bottom-right');
  } else {
   console.log('top-right');
  }
} else {
  if (y==1) {
   console.log('bottom-left');
  } else {
   console.log('top-left');
  }
}
There are several ways of handling this. One is to, again, try to move the decision on a higher level. Example:
if (x==1) {
  logRight(y);
} else {
  logLeft(y);
}
Of course, this particular case can be fixed through computation, like this:
var h=y==1?'right':'left';
var v=x==1?'bottom':'top';
console.log(v+'-'+h);
Assuming it was not so simple, though, we can choose to reduce the choice to a single decision:
switch(x+','+y) {
  case '0,0': console.log('top-left'); break;
  case '0,1': console.log('bottom-left'); break;
  case '1,0': console.log('top-right'); break;
  case '1,1': console.log('bottom-right'); break;
}



The Lazy Event


Another more complicated issue regarding conditional statements is when they are not actually encoding a decision, but testing for a change. Something like:
if (current!=prev) {
  clearData();
  var data=computeData(current);
  setData(data);
  prev=current;
}
This is a perfectly valid piece of code and in many situations is what is required. However, one must pay attention to the place where the decision gets taken as compared with the place the value changed. Isn't that more like an event handler, something that should be designed differently, architecture wise? Why keep a previous value and react to the change only when I get into this piece of code and not react to the change of the value immediately? Fire an event that the value is changed and subscribe to the event via a piece of code that refreshes the data. One giveaway for this is that in the code above there is no actual use of the prev value other than to compare it and set it.

Generalizations


As a general rule, try to take the decisions that are codified by if and switch statements as early as possible. The code must be readable to humans, sometimes in detriment of performance, if it is not essential to the functionality of your program. Avoid decision statements within other decision statements (arrow ifs, ternary operator in a ternary operator, imbricated switch and if statements). Split large pieces of code into small, easy to understand and properly named, methods (essentially creating a lower level than your conditional statement, thus relatively taking it higher in the code hierarchy).

What's next


I know this is a lower level programming blog post, but not everyone reading my blog is a senior dev - I mean, I hope so, I don't want to sound completely stupid. I am planning some new stuff, related to my new work project, but it might take some time until I understand it myself. Meanwhile, I am running out of ideas for my 100 days of writing about code challenge, so suggestions are welcome. And thank you for reading so far :)

0 comments: