Sunday, December 04, 2016

Dependency Injection, Inversion of Control, Testability and other nice things

I am mentally preparing for giving a talk about dependency injection and inversion of control and how are they important, so I intend to clarify my thoughts on the blog first. This has been spurred by seeing how so many talented and even experienced programmers don't really understand the concepts and why they should use them. I also intend to briefly explore these concepts in the context of programming languages other than C#.

And yes, I know I've started an ASP.Net MVC exploration series and stopped midway, and I truly intend to continue it, it's just that this is more urgent.

Head on intro


So, instead of going to the definitions, let me give you some examples, instead.
public class MyClass {
  public IEnumerable<string> GetData() {
    var provider=new StringDataProvider();
    var data=provider.GetStringsNewerThan(DateTime.Now-TimeSpan.FromHours(1));
    return data;
  }
}
In this piece of code I create a class that has a method that gets some text. That's why I use a StringDataProvider, because I want to be provided with string data. I named my class so that it describes as best as possible what it intends to do, yet that descriptiveness is getting lost up the chain when my method is called just GetData. It is called so because it is the data that I need in the context of MyClass, which may not care, for example, that it is in string format. Maybe MyClass just displays enumerations of objects. Another issue with this is that it hides the date and time parameter that I pass in the method. I am getting string data, but not all of it, just for the last hour. Functionally, this will work fine: task complete, you can move to the next. Yet it has some nagging issues.

Dependency Injection


Let me show you the same piece of code, written with dependency injection in mind:
public class MyClass {
  private IDataProvider _dataProvider;
  private IDateTimeProvider _dateTimeProvider;

  public void MyClass(IDataProvider dataProvider, IDateTimeProvider dateTimeProvider) {
    this._dataProvider=dataProvider;
    this._dateTimeProvider=dateTimeProvider;
  }

  public IEnumerable<string> GetData() {
    var oneHourBefore=_dateTimeProvider.Now-TimeSpan.FromHours(1);
    var data=_dataProvider.GetDataNewerThan(oneHourBefore);
    return data;
  }
}
A lot more code, but it solves several issues while introducing so many benefits that I wonder why people don't code like this from the get go.

Let's analyse this for a bit. First of all I introduce a constructor to MyClass, one that accepts and caches two parameters. They are not class types, but interfaces, which declare the intention for any class implementing them. The method then does the same thing as in the original example, using the providers it cached. Now, when I write the code of the class I don't actually need to have any provider implementation. I just declare what I need and worry about it later. I also don't need to inject real providers, I can mock them so that I can test my class as standalone. Note that the previous implementation of the class would have returned different data based on the system time and I had no way to control that behavior. The best benefit, for me, is that now the class is really descriptive. It almost reads like English: "Hi, folks, I am a class that needs someone to give me some data and the time of day and I will give you some processed data in return!". The rule of thumb is that for each method, external factors that may influence its behavior must be abstracted away. In our case if the date time provider provides the same time and the data provider the same data, the effect of the method is always the same.

Note that the interface I used was not IStringDataProvider, but IDataProvider. I don't really care, in my class, that the data is a bunch of strings. There is something called the Single Responsibility Principle, which says that a class or a method or some sort of unit of computation should try to only have one responsibility. If you change that code, it should only affect one area. Now, real life is a little different and classes do many things in many directions, yet they can implement any number of interfaces. The interfaces themselves can declare only one responsibility, which is why this is so nice. I don't actually have to have a class that is only a data provider, but in the context of my class, I only need that part and I am clearly declaring my intent in the code.

This here is called dependency injection, which is a fancy expression for saying "my code receives all third party instances as parameters". It is also in line with the Single Responsibility Principle, as now your class doesn't have to carry the responsibility of knowing how to instantiate the classes it needs. It makes the code more modular, easier to test, more legible and more maintainable.

But there is a problem. While before I was using something like new MyClass().GetData(), now I have to push the instantiation of the providers somewhere up the stream and do maybe something like this:
var dataProvider=new StringDataProvider();
var dateTimeProvider=new DateTimeProvider();
var myClass=new MyClass(dataProvider,dateTimeProvider);
myClass.GetData();
The apparent gains were all for naught! I just pushed the same ugly code somewhere else. But here is where Inversion of Control comes in. What if you never need to instantiate anything again? What it you never actually had to write any new Something() code?

Inversion of Control


Inversion of Control actually takes over the responsibility of creating instances from you. With it, you might get this code instead:
public interface IMyClass {
  IEnumerable<string> GetData();
}

public class MyClass:IMyClass {
  private IDataProvider _dataProvider;
  private IDateTimeProvider _dateTimeProvider;

  public void MyClass(IDataProvider dataProvider, IDateTimeProvider dateTimeProvider) {
    this._dataProvider=dataProvider;
    this._dateTimeProvider=dateTimeProvider;
  }

  public IEnumerable<string> GetData() {
    var oneHourBefore=_dateTimeProvider.Now-TimeSpan.FromHours(1);
    var data=_dataProvider.GetDataNewerThan(oneHourBefore);
    return data;
  }
}
Note that I created an interface for MyClass to implement, one that declares my GetData method. Now, to use it, I could write something like this:
var myClass=Dependency.Get<IMyClass>();
myClass.GetData();

Wow! What happened here? I just used a magical class called Dependency that gets me an instance of IMyClass. And I really don't care how it does it. It can discover implementations by itself or maybe I am manually binding interfaces to implementations when the application starts (for example Dependency.Bind<IMyClass,MyClass>();). When it needs to create a new MyClass it automatically sees that it needs two other interfaces as parameters, so it gets implementations for those first and continues up the chain. It is called a dependency chain and the container will go through it all to simply "Get" you what you need. There are many inversion of control frameworks out there, but the concept is so simple that one can make their own easily.

And I get another benefit: if I want to display some other type of data, all I have to do is instruct the dependency container that I want another implementation for the interface. I can even think about versioning: take a class that I know does the job and compare it with a new implementation of the same interface. I can tell it to use different versions based on the client used. And all of this in exactly one place: the dependency container bindings. You may want to plug different implementations provided by third parties and all they have to care about is respecting the contract in your interface.



Solution structure


This way of writing code forces some changes in the structure of your projects. If all you have is written in a single project, you don't care, but if you want to split your work in several libraries, you have to take into account that interfaces need to be referenced by almost everything, including third party modules that you want to plug. That means the interfaces need their own library. Yet in order to declare the interfaces, you need access to all the data objects that their members need, so your Interfaces project needs to reference all the projects with data objects in them. And that means that your logic will be separated from your data objects in order to avoid circular dependencies. The only project that will probably need to go deeper will be the unit and integration test project.

Bottom line: in order to implement this painlessly, you need an Entities library, containing data objects, then an Interfaces library, containing the interfaces you need and, maybe, the dependency container mechanism, if you don't put it in yet another library. All the logic needs to be in other projects. And that brings us to a nice side effect: the only connection between logic modules is done via abstractions like interfaces and simple data containers. You can now substitute one library with another without actually caring about the rest. The unit tests will work just the same, the application will function just the same and functionality can be both encapsulated and programatically described.

There is a drawback to this. Whenever you need to see how some method is implemented and you navigate to definition, you will often reach the interface declaration, which tells you nothing. You then need to find classes that implement the interface or to search for uses of the interface method to find implementations. Even so, I would say that this is an IDE problem, not a dependency injection issue.

Other points of view


Now, the intro above describes what I understand by dependency injection and inversion of control. The official definition of Dependency Injection claims it is a subset of Inversion of Control, not a separate thing.

For example, Martin Fowler says that when he and his fellow software pattern creators thought of it, they called it Inversion of Control, but they decided that it was too broad a term, so they moved to calling it Dependency Injection. That seems strange to me, since I can describe situations where dependencies are injected, or at least passed around, but they are manually instantiated, or situations where the creation of instances is out of the control of the developer, but no dependencies are passed around. He seems to see both as one thing. On the other hand, the pattern where dependencies are injected by constructor, property setters or weird implementation of yet another set of interfaces (which he calls Dependency Injection) is different from Service Locator, where you specifically ask for a type of service.

Wikipedia says that Dependency Injection is a software pattern which implements Inversion of Control to resolve dependencies, while it calls Inversion of Control a design principle (so, not a pattern?) in which custom-written portions of a computer program receive the flow of control from a generic framework. It even goes so far as to say Dependency Injection is a specific type of Inversion of Control. Anyway, the pages there seem to follow the general definitions that Martin Fowler does, which pits Dependency Injection versus Service Locator.

On StackOverflow a very well viewed answer sees dependency injection as "giving an object its instance variables". I tend to agree. I also liked another answer below that said "DI is very much like the classic avoiding of hardcoded constants in the code." It makes one think of a variable as an abstraction for values of a certain type. Same page holds another interesting view: "Dependency Injection and dependency Injection Containers are different things: Dependency Injection is a method for writing better code, a DI Container is a tool to help injecting dependencies. You don't need a container to do dependency injection. However a container can help you."

Another StackOverflow question has tons of answers explaining how Dependency Injection is a particular case of Inversion of Control. They all seem to have read Fowler before answering, though.

A CodeProject article explains how Dependency Injection is just a flavor of Inversion of Control, others being Service Locator, Events, Delegates, etc.

Composition over inheritance, convention over configuration


An interesting side effect of this drastic decoupling of code is that it promotes composition over inheritance. Let's face it: inheritance was supposed to solve all of humanity's problems and it failed. You either have an endless chain of classes inheriting from each other from which you usually use only one or two or you get misguided attempts to allow inheritance from multiple sources which complicates understanding of what does what. Instead interfaces have become more widespread, as declarations of intent, while composition has provided more of what inheritance started off as promising. And what is dependency injection if not a sort of composition? In the intro example we compose a date time provider and a data provider into a time aware data provider, all the time while the actors in this composition need to know nothing else than the contracts each part must abide by. Do that same thing with other implementations and you get a different result. I will go as far as to say that inheritance defines what classes are, while composition defines what classes do, which is what matters in the end.

Another interesting effect is the wider adoption of convention over configuration. For example you can find the default implementation of an interface as the class that implements it and has the same name minus the preceding "I". Rather than explicitly tell the framework that we want to use the Manager class each time someone needs an IManager implementation, it can figure it out for itself by naming alone. This would never work if the responsibility of getting class instances resided with each method using them.

Real life examples


Simple Injector


If you look on the Internet, one of the first dependency injection frameworks you find for .Net is Simple Injector, which works on every flavor of .Net including Mono and Core. It's as easy to use as installing the NuGet package and doing something like this:
// 1. Create a new Simple Injector container
var container = new Container();

// 2. Configure the container (register)
container.Register<IUserRepository, SqlUserRepository>(Lifestyle.Transient);
container.Register<ILogger, MailLogger>(Lifestyle.Singleton);

// 3. Optionally verify the container's configuration.
container.Verify();

// 4. Get the implementation by type
IUserService service = container.GetInstance<IUserService>();

ASP.Net Core


ASP.Net Core has dependency injection built in. You configure your bindings in ConfigureServices:
public void ConfigureServices(IServiceCollection svcs)
{
  svcs.AddSingleton(_config);
 
  if (_env.IsDevelopment())
  {
    svcs.AddTransient<IMailService, LoggingMailService>();
  }
  else
  {
    svcs.AddTransient<IMailService, MailService>();
  }
 
  svcs.AddDbContext<WilderContext>(ServiceLifetime.Scoped);
 
  // ...
}
then you use any of the registered classes and interfaces as constructor parameters for controllers or even using them as method parameters (see FromServicesAttribute)

Managed Extensibility Framework


MEF is a big beast of a framework, but it can simplify a lot of work you would have to do to glue things together, especially in extensibility scenarios. Typically one would use attributes to declare which interface something "exports" and then use other attributes to "import" implementations in properties and values. All you need to do is put them in the same place. Something like this:
[Export(typeof(ICalculator))]
class SimpleCalculator : ICalculator {
  //...
}

class Program {

  [Import(typeof(ICalculator))]
  public ICalculator calculator;

  // do something with calculator
}
Of course, in order for this to work seamlessly you need stuff like this, as well:
private Program()
{
    //An aggregate catalog that combines multiple catalogs
    var catalog = new AggregateCatalog();
    //Adds all the parts found in the same assembly as the Program class
    catalog.Catalogs.Add(new AssemblyCatalog(typeof(Program).Assembly));
    catalog.Catalogs.Add(new DirectoryCatalog("C:\\Users\\SomeUser\\Documents\\Visual Studio 2010\\Projects\\SimpleCalculator3\\SimpleCalculator3\\Extensions"));


    //Create the CompositionContainer with the parts in the catalog
    _container = new CompositionContainer(catalog);

    //Fill the imports of this object
    try
    {
        this._container.ComposeParts(this);
    }
    catch (CompositionException compositionException)
    {
        Console.WriteLine(compositionException.ToString());
    }
}

Dependency Injection in other languages


Admit it, C# is great, but it is not by far the most used computer language. That place is reserved, at least for now, for Javascript. Not only is it untyped and dynamic, but Javascript isn't even a class inheritance language. It uses the so called prototype inheritance, which uses an instance of an object attached to a type to provide default values for the instance of said type. I know, it sounds confusing and it is, but what is important is that it has no concept of interfaces or reflection. So while it is trivial to create a dictionary of instances (or functions that create instances) of objects which you could then use to get what you need by using a string key (something like var manager=Dependency.Get('IManager');, for example) it is difficult to imagine how one could go through the entire chain of dependencies to create objects that need other objects.

And yet this is done, by AngularJs, RequireJs or any number of modern Javascript frameworks. The secret? Using regular expressions to determine the parameters needed for a constructor function after turning it to string. It's complicated and beyond the scope of this blog post, but take a look at this StackOverflow question and its answers to understand how it's done.

Let me show you an example from AngularJs:
angular.module('myModule', [])
  .directive('directiveName', ['depService', function(depService) {
    // ...
  }])
In this case the key/type of the service is explicit using an array notation that says "this is the list of parameters that the dependency injector needs to give to the function", but this might be have been written just as the function:
angular.module('myModule', [])
  .directive('directiveName', function(depService) {
    // ...
  })
In this case Angular would use the regular expression approach on the function string.


What about other languages? Java is very much like C# and the concepts there are similar. Even if all are flavors of C, C++ is very different, yet Dependency Injection can be achieved. I am not a C++ developer, so I can't tell you much about that, but take a look at this StackOverflow question and answers; it is claimed that there is no one method, but many that can be used to do dependency injection in C++.

In fact, the only languages I can think of that can't do dependency injection are silly ones like SQL. Since you cannot (reasonably) define your own types or pass functions along, the concept makes no sense. Even so, one can imagine creating dummy stored procedures that other stored procedures would use in order to be tested. There is no reason why you wouldn't use dependency injection if the language allows for it.

Testability


I mentioned briefly unit testing. Dependency Injection works hand in hand with automated testing. Given that the practice creates modules of software that give reproducible results for the same inputs and account for all the inputs, testing becomes a breeze. Let me give you some examples using Moq, a mocking library for .Net:
var dateTimeMock=new Mock<IDateTimeProvider>();
dateTimeMock
  .Setup(m=>m.Now)
  .Returns(new DateTime(2016,12,03));

var dataMock=new Mock<IDataProvider>();
dataMock
  .Setup(m=>m.GetDataNewerThan(It.IsAny<DateTime>()))
  .Returns(new[] { "test","data" });

var testClass=new MyClass(dateTimeMock.Object, dataMock.Object);

var result=testClass.GetData();
AssertDeepEqual(result,new[] { "test","data" });

First of all, I take care of all dependencies. I create a "mock" for each of them and I "set up" the methods or property setters/getters that interest me. I don't really need to set up the date time mock for Now, since the data from the data provider is always the same no matter the parameter, but it's there for you to see how it's done. Second, I instantiate the class I want to test using the Object property of my mocks, which returns an object that implements the type given as a generic parameter in the constructor. Third I assert that the side effects of my call are the ones I expect. The mocks need to be as dumb as possible. If you feel you need to write code to define your mocks you are probably doing something wrong.

The type of the tests, for people who are not familiar with this concept, is usually a fully positive one - that is give full valid data and expect the correct result - followed by many negative ones, where the correct data is made incorrect in all possible ways and it is tested that the method fails. If there are many types of combinations of data that would be considered valid, you need a test for as many of them.

Note that the test is instantiating the test class directly, using the constructor. We are not testing the injector here, but the actual class.

Conclusions


What I appreciate most with Dependency Injection is that it forces you to write code that has clear boundaries defined by interfaces. Once this is achieved, you can go write your own stuff and not care about what other people do with theirs. You can test your modules without even caring if the rest of the project even exists. It allows to refactor code in steps and with a lot more confidence since you are covered by unit tests.

While some people work on fire-and-forget projects, like small games or utilities, and they don't care about maintainability, one of the most touted reasons for using unit tests and dependency injection, these practices bring so many other benefits that are almost impossible to get otherwise.

The entire point of this is reducing the complexity of dependencies, which include not only the modules in your application, but also the support frame for them, like people working on them. While some managers might not see the wisdom of reducing friction between software components, surely they can see the positive value of reducing friction between people.

There was one other topic that I wanted to touch, but it is both vast and I have not enough experience with it, however it feels very attractive to me: refactoring old code in order to use dependency injection. Best practices, how to make it safe enough and fast enough to make managers approve it and so on. Perhaps another post later on. I was thinking of a combination of static analysis and automated methods, like replacing all usages of "new" with a single point of instantiation, warning about static methods and properties, automatically replacing known bad practices like DateTime.Now and so on. It might be interesting, right?

I hope I wasn't too confusing and I appreciate any feedback you have. I will be working on a presentation file with similar content, so any help will go into doing a better job explaining it to others.

0 comments: