Wednesday, August 31, 2016

Calamity (Reckoners #3), by Brandon Sanderson

book cover I've read the book in a day. Just like the other two in the series (Steelheart and Firefight), I was caught up in the rhythm of the characters and the overwhelming positivity of the protagonist. Perhaps strange, I kind of missed descriptions in this one, as both locations and characters were left to the imagination and everything was action and dialogue.

Brandon Sanderson ends (why!!?? Whyy?!!?) the Reckoners series with this third book called Calamity. You absolutely have to read the other two books to understand anything about it. In fact, Reckoners feels more like a single story released in three installments than a true trilogy. In it, the team has to reckon (heh heh heh) with their leader going rogue and have a decision to make: either bring him back or kill him. All this while fighting off various Epics in different stages of madness.

I can't really say anything about the story without spoiling the hell out of it. I loved the first two books and I loved this one. Indeed, I have yet to find a Brandon Sanderson book that I don't like. If you are into superhero stories, the Reckoners series has a refreshingly original plot, a wonderful main character and true debate about what heroism really is. I recommend it highly!

Tuesday, August 30, 2016

An elegant alternative to a for 1..n when you don't actually need the value

Some times you need to repeat a code block a number of times and the solution is often a for block.
for (var i=0; i<n; i++)...
This is a complex line to write and most importantly obscures the intent of the code. Wouldn't it be better to have some kind of construct that says "repeat N times" and be intuitively easy to understand? Well there is one:
while (n-->0) ...
No, it isn't some C# construct that you have not heard of before, it's a while loop that checks on the value of n, then decrements it. But it looks great! It almost reads as "while n moves to 0". I liked it and I thought I should share.

Monday, August 29, 2016

The Mirror Thief, by Martin Seay

book cover The Mirror Thief is a really interesting book. It is well written, original in ideas and Martin Seay has his own unique writing style. It is also a very deceptive book, always changing shape, misleading the reader over what he is actually reading.

The book starts in Vegas, with a black former military policeman named Curtis arriving in search of a certain Stanley, in order to give him a message from a friend. The story feels like a detective-noir, but immediately there are things that just don't belong. The apparent mystical talents of Stanley, the very detailed descriptions of the surroundings, using rarely met but very specific words. At the time I thought I was going to read some sort of mystical noir thing akin to Cast a Deadly Spell.

But then the plot switches to the story of Stanley when he was a kid, hustling people on the street and doing various other bad things, his only anchor a book called The Mirror Thief, a poetry book that describes the adventures of a certain Vettor Crivano in Venice at the end of the 16th century. While he doesn't really understand what it is about, he feels that the unnaturally meandering book hides some sort of universal secret. As a reader, you start to doubt what you have read until now. After all, aren't you reading a confusing meandering book with a lot of unexplainable details? Stanley has traversed the United States in order to find the author of the book and ask him to reveal the mystical secret that would give him the power he craves.

And just when you think you figured it out, the book reveals a third story, that of Crivano himself, but not in poetry and apparently unrelated to the book about him. While he is an alchemist and physician, his best skill appears to be fighting, and he only uses it effectively at the end. All main characters: Curtis, Stanley and Crivano feel absurdly human and flawed. Curtis carries a gun everywhere and he doesn't get to use it once, while being disarmed multiple times. He doesn't get what's going on up until the very end. Crivano is also deceived several times, another pawn in the big game of life. Paradoxically Stanley is the one most in control of his life, mostly by rejecting everything society considers normal or even moral and choosing his every step.

To me, the book was most of all about perception. The reader is confusedly pinballed from perspective to perspective, even with each of them painstakingly detailed. While reading the book you learn new words, old words, history of three different times and places and intimately get to know each character. When you get tricked, you are just following characters that get tricked, disappointed or set up themselves. The three stories are really completely unrelated, at most red herrings when mentioned in the others, and offer little closure. It is all about understanding there are other ways of looking at the world.

Bottom line: Martin Seay is often accosted by readers begging for explanations of what they have read. You cannot read the book and not feel it is a good book, but actually enjoying it is a different thing altogether. At the end you start thinking about the book, about the world, about yourself, wondering if you didn't just read the whole thing wrong and whether maybe you should start over.

Other resources:

Exciting position in a random game of chess

I have been playing a little with the Houdini chess engine and Chess Arena. I limit the ELO of the engine to a set value and then I try to beat it. It's not like playing a human being, but saves me the humiliation of being totally thrashed by another person :) Plus I didn't have Internet. In this case the ELO was set to 1400.

One of the games I played turned out to be extremely interesting (and short). Black tried the Elephant Gambit, to which I replied clumsily, but then it made two horrible mistakes and I saw the correct continuation. What I thought was really interesting is how the computer reacted. In what I thought would lead to some sort of piece advantage after a king and rook fork turned out to be a mating situation. The only solution for the computer was to sacrifice the queen and trying to save her would have resulted in mates or even worse situations.
I will publish the PGN here, so you can explore the variations, but before that I want to point out another greatly interesting move. As the Black queen attempts to escape, the next move is a king-queen-rook fork, yet after the king moves White does not capture the queen but does a seemingly random move: 12. Qd4. Why is that? I leave you to check it out for yourselves!

1. e4 e5 2. Nf3 d5 {The Elephant Gambit} 3. Nxe5 Qf6 4.
d4 dxe4 5. Nc3 Bb4 6.
Bd2 Qf5 {A really bad move giving a 3 point advantage to
White.} 7. Nd5 Bd6 8. c3 {My turn for a bad move,
anything went there, Bc4, g4, but I did this, going back to 1 pawn
advantage.} Bxe5 9. dxe5 Qxe5 {Disastrous move, but interesting. Check out the continuations.} (9.
.. Qd7 {This would have been the only decent move.}) 10. Bf4 Qxf4 {Amazingly, the only move here is to sacrifice the
queen for either bishop or knight. Attempts to save the queen lead to
mate!} (10. .. Qf5 {queen attempts to escape.} 11. Nxc7+ {leads to mate in
1. no matter when the king goes.} Kf8 (11. .. Ke7 12. Qd6#) 12. Qd8#) (10.
.. Qe6 {queen attempts to live just a bit longer.} 11. Nxc7+ {Chessgasm!
and Black's pain is only beginning.} Ke7 12. Qd4 {This is the most
interesting move here and by far the best by Houdini's calculations.} Nf6
(12. .. Nc6 13. Qc5+ Kd8 14. O-O-O+ Nd4 15. Rxd4+ Bd7 16. Qf8+ Qe8 17.
Qxe8#) (12. .. Qc6 13. Bb5 Qf6 14. Qb4+ Kd8 15. Qf8#) (12. .. Qf6 13. Qc5+
Kd7 14. O-O-O+ Qd4 15. Rxd4#) 13. Qc5+ Kd8 14. Nxe6+ Bxe6 15. Qc7+ Ke8 16.
Qxb7 Bd5 17. Qc8+ Ke7 18. Qxh8) 11. Nxf4 Nc6 {White wins
easily from +11 points} 1-0

Monday, August 22, 2016

Gamification is a trap word. It is only inversely connected to games.

My mind has been wandering around the concept of gamification for a few weeks now. In short, it's the idea of turning a task into a game to increase motivation towards completing it. And while there is no doubt that it works - just check all the stupid games that people play obsessively in order to gain some useless points - it was a day in the park that made it clear how wrong the term is in connecting point systems to games.

Health, energy, willpower as metrics. Where is the happiness?

I was walking with some friends and I saw two kids playing. One was shooting a ball with his feet and the other was riding a bicycle. The purpose of their ad-hoc game was for the guy with the ball to hit the kid on the bike. They were going at it again and again, squealing in joy, and it hit me: they invented a game. And while all games have a goal, not all of them require points. Moreover, the important thing is not the points themselves, but who controls the point system. That was my epiphany: they invented their own game, with its own goal and point system, but they were controlling the way points were awarded and ultimately how much they mattered. The purpose of the game was actually to challenge the players, to gently explore their limitations and try to push boundaries a little bit further out. It wasn't about losing or winning, it was about learning and becoming.

In fact, another word for point system is currency. We all know how money relates to motivation and happiness, so how come we got conned into believing that turning something into a game means showing flashy animations filled with positive emotion that award you arbitrary sums of arbitrary types of points? I've tried some of these things myself. It feels great at times to go up the (arbitrary) ranks or levels or whatever, while golden chests and diamonds and untold riches are given to me. But soon enough the feeling of emptiness overcomes the fictional rewards. I am not challenging myself, instead I am doing something repetitive and boring. That and the fact that most of these games are traps to make you spend actual cash or more valuable currency to buy theirs. You see, the game of the developers is getting more money. And they call it working, not playing.

I was reading this book today and in it a character says that money is the greatest con: it is only good for making more money. Anything that can be bought can be stolen. And it made a lot of sense to me. When you play gamified platforms like the ones I am describing, the goal of the game quickly changes in your mind. You start to ignore pointless (pardon the pun) details, like storyline, character development, dialogues, the rush of becoming better at something, the skills one acquires, even the fun of playing. Instead you start chasing stars, credits, points, jewels, levels, etc. You can then transfer those points, maybe convert them into something or getting more by converting money into them. How about going around and tricking other players to give you their points? And suddenly, you are playing a different game, the one called work. Nobody can steal the skills you acquire from you, but they can always steal your title or your badge or your trophy or the money you made.

What I am saying is that games have a goal that defines them. Turning that goal into a metric irrevocably perverts the game. Even sports like football, that start off as a way of proving your team is better than the other team and incidentally improving your physical fitness, turn into ugly deformed versions of themselves where the bottom line is getting money from distribution rights, where goals can be bought or stolen by influencing a referee, for example.

I remember this funny story about a porn game that had a very educational goal: make girls reach orgasm. In order to translate this into a computer program, the developers had several measurements of pleasure - indicated in the right side of the screen as colored bars - which all had to go over a threshold in order to make the woman cum. What do you think happened? Players ignored the moaning image of a naked female and instead focused solely on the bars. Focus on the metric and you ignore the actual goal.

To summarize: a game requires of one to define their limits, acknowledge them, then try to break them. While measuring is an important part of defining limits, the point is in breaking them, not in acquiring tokens that somehow prove it to other people. If you want to "gamify" work, then the answer is to do your tasks better and harder and to do it for yourself, because you like who you become. When you do it in order to make more money, that's work, and to win that game you only need to trick your employers or your customers that you are doing what is required. And it's only play when you enjoy who you are when doing it.

As an aside, I know people that are treating making money like a game. For them making more and more money is a good thing, it challenges them, it makes them feel good about themselves. They can be OK people that sometimes just screw you over if they feel the goal of their game is achieved better by it. These people never gamified work, they were playing a game from the very start. They love doing it.

Stay true to the goal! That is the game.

Friday, August 12, 2016

Lab Girl, by Hope Jahren

The book's cover and a picture of the author Lab Girl should have been the kind of book I like: a deeply personal autobiography. Hope Jahren writes well, also, and in 14 chapters goes through about 20 years of her life, from the moment she decided she would be a scientist to the moment when she was actually accepted as a full professor by academia. She talks about her Norwegian family education, about the tough mother that never gave her the kind of love she yearned for, she talks about misogyny in science, about deep feelings for her friends, she talks about her bipolar disorder and her pregnancy. Between chapters she interposes a short story about plants, mostly trees, as metaphors for personal growth. And she is an introvert who works and is best friends with a guy who is even more an introvert than she is. What is not to like?

And the truth is that I did like the book, yet I couldn't empathize with her "character". Each chapter is almost self contained, there is no continuity and instead of feeling one with the writer I was getting the impression that she overthinks stuff and everything I read is a memory of a memory of a thought. I also felt there was little science in a book written by someone who loves science, although objectively there is plenty of stuff to rummage through. Perhaps I am not a plant person.

The bottom line is that I was expecting someone autopsying their daily life, not paper wrapping disjointed events that marked their life in general. As it usually is with expectations, I felt a bit disappointed when the author had other plans with her book. It does talk about deep feelings, but I was more interested in the actual events than the internal projection of them. However if you are the kind of person who likes the emotional lens on life, you will probably like the book more than I did.

Finding the intersection of two large sorted arrays

I am going to discuss in this post an interview question that pops up from time to time. The solution that is usually presented as best is the same, regardless of the inputs. I believe this to be a mistake. Let me explore this with you.

The problem

A sorted array

The problem is simple: given two sorted arrays of very large size, find the most efficient way to compute their intersection (the list of common items in both).

The solution that is given as correct is described here (you will have to excuse its Javiness), for example. The person who provided the answer made a great effort to list various solutions and list their O complexity and the answer inspires confidence, as coming from one who knows what they are talking about. But how correct is it? Another blog post describing the problem and hinting on some extra information that might influence the result is here.

Implementation


Let's start with some code:
var rnd = new Random();
var n = 100000000;
int[] arr1, arr2;
generateArrays(rnd, n, out arr1, out arr2);
var sw = new Stopwatch();
sw.Start();
var count = intersect(arr1, arr2).Count();
sw.Stop();
Console.WriteLine($"{count} intersections in {sw.ElapsedMilliseconds}ms");
Here I am creating two arrays of size n, using a generateArrays method, then I am counting the number of intersections and displaying the time elapsed. In the intersect method I will also count the number of comparisons, so that we avoid for now the complexities of Big O notation (pardon the pun).

As for the generateArrays method, I will use a simple incremented value to make sure the values are sorted, but also randomly generated:
private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    int s1 = 0;
    int s2 = 0;
    for (var i = 0; i < n; i++)
    {
        s1 += rnd.Next(1, 100);
        arr1[i] = s1;
        s2 += rnd.Next(1, 100);
        arr2[i] = s2;
    }
}

Note that n is 1e+7, so that the values fit into an integer. If you try a larger value it will overflow and result in negative values, so the array would not be sorted.

Time to explore ways of intersecting the arrays. Let's start with the recommended implementation:
private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1<arr1.Length && p2<arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch(v1.CompareTo(v2))
        {
            case -1:
                p1++;
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2++;
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}

Note that I am not counting the comparisons of the two pointers p1 and p2 with the Length of the arrays, which can be optimized by caching the length. They are just as resource using as comparing the array values, yet we discount them in the name of calculating a fictitious growth rate complexity. I am going to do that in the future as well. The optimization of the code itself is not part of the post.

Running the code I get the following output:
19797934 comparisons
199292 intersections in 832ms

The number of comparisons is directly proportional with the value of n, approximately 2n. That is because we look for all the values in both arrays. If we populate the values with odd and even numbers, for example, so no intersections, the number of comparisons will be exactly 2n.

Experiments


Now let me change the intersect method, make it more general:
private static IEnumerable<int> intersect(int[] arr1, int[] arr2)
{
    var p1 = 0;
    var p2 = 0;
    var comparisons = 0;
    while (p1 < arr1.Length && p2 < arr2.Length)
    {
        var v1 = arr1[p1];
        var v2 = arr2[p2];
        comparisons++;
        switch (v1.CompareTo(v2))
        {
            case -1:
                p1 = findIndex(arr1, v2, p1, ref comparisons);
                break;
            case 0:
                p1++;
                p2++;
                yield return v1;
                break;
            case 1:
                p2 = findIndex(arr2, v1, p2, ref comparisons);
                break;
        }

    }
    Console.WriteLine($"{comparisons} comparisons");
}

private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    p++;
    while (p < arr.Length)
    {
        comparisons++;
        if (arr[p] >= v) break;
        p++;
    }
    return p;
}
Here I've replaced the increment of the pointers with a findIndex method that keeps incrementing the value of the pointer until the end of the array is reached or a value larger or equal with the one we are searching for was found. The functionality of the method remains the same, since the same effect would have been achieved by the main loop. But now we are free to try to tweak the findIndex method to obtain better results. But before we do that, I am going to P-hack the shit out of this science and generate the arrays differently.

Here is a method of generating two arrays that are different because all of the elements of the first are smaller than the those of the second. At the very end we put a single element that is equal, for the fun of it.
private static void generateArrays(Random rnd, int n, out int[] arr1, out int[] arr2)
{
    arr1 = new int[n];
    arr2 = new int[n];
    for (var i = 0; i < n - 1; i++)
    {
        arr1[i] = i;
        arr2[i] = i + n;
    }
    arr1[n - 1] = n * 3;
    arr2[n - 1] = n * 3;
}

This is the worst case scenario for the algorithm and the value of comparisons is promptly 2n. But what if we would use binary search (what in the StackOverflow answer was dismissed as having O(n*log n) complexity instead of O(n)?) Well, then... the output becomes
49 comparisons
1 intersections in 67ms
Here is the code for the findIndex method that would do that:
private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var start = p + 1;
    var end = arr.Length - 1;
    if (start > end) return start;
    while (true)
    {
        var mid = (start + end) / 2;
        var val = arr[mid];
        if (mid == start)
        {
            comparisons++;
            return val < v ? mid + 1 : mid;
        }
        comparisons++;
        switch (val.CompareTo(v))
        {
            case -1:
                start = mid + 1;
                break;
            case 0:
                return mid;
            case 1:
                end = mid - 1;
                break;
        }
    }
}

49 comparisons is smack on the value of 2*log2(n). Yeah, sure, the data we used was doctored, so let's return to the randomly generated one. In that case, the number of comparisons grows horribly:
304091112 comparisons
199712 intersections in 5095ms
which is larger than n*log2(n).

Why does that happen? Because in the randomly generated data the binary search find its worst case scenario: trying to find the first value. It divides the problem efficiently, but it still has to go through all the data to reach the first element. Surely we can't use this for a general scenario, even if it is fantastic for one specific case. And here is my qualm with the O notation: without specifying the type of input, the solution is just probabilistically the best. Is it?

Let's compare the results so far. We have three ways of generating data: randomly with increments from 1 to 100, odds and evens, small and large values. Then we have two ways of computing the next index to compare: linear and binary search. The approximate numbers of comparisons are as follows:
RandomOddsEvensSmallLarge
Linear2n2n2n
Binary search3/2*n*log(n)2*n*log(n)2*log(n)

Alternatives


Can we create a hybrid findIndex that would have the best of both worlds? I will certainly try. Here is one possible solution:
private static int findIndex(int[] arr, int v, int p, ref int comparisons)
{
    var inc = 1;
    while (true)
    {
        if (p + inc >= arr.Length) inc = 1;
        if (p + inc >= arr.Length) return arr.Length;
        comparisons++;
        switch(arr[p+inc].CompareTo(v))
        {
            case -1:
                p += inc;
                inc *= 2;
                break;
            case 0:
                return p + inc;
            case 1:
                if (inc == 1) return p + inc;
                inc /= 2;
                break;
        }
    }
}

What am I doing here? If I find the value, I return the index; if the value is smaller, not only do I advance the index, but I also increase the speed of the next advance; if the value is larger, then I slow down until I get to 1 again. Warning: I do not claim that this is the optimal algorithm, this is just something that was annoying me and I had to explore it.

OK. Let's see some results. I will decrease the value of n even more, to a million. Then I will generate the values with random increases of up to 10, 100 and 1000. Let's see all of it in action! This time is the actual count of comparisons (in millions):
Random10Random100Random1000OddsEvensSmallLarge
Linear22222
Binary search303030400.00004
Accelerated search3.43.93.940.0002

So for the general cases, the increase in comparisons is at most twice, while for specific cases the decrease can be four orders of magnitude!

Conclusions


Because I had all of this in my head, I made a fool of myself at a job interview. I couldn't reason all of the things I wrote here in a few minutes and so I had to clear my head by composing this long monstrosity.

Is the best solution the one in O(n)? Most of the time. The algorithm is simple, no hidden comparisons, one can understand why it would be universally touted as a good solution. But it's not the best in every case. I have demonstrated here that I can minimize the extra comparisons in standard scenarios and get immense improvements for specific inputs, like arrays that have chunks of elements smaller than the next value in the other array. I would also risk saying that this findIndex version is adaptive to the conditions at hand with improbable scenarios as worst cases. It works reasonable well for normally distributed arrays, it does wonders for "chunky" arrays (in this is included the case when one array is much smaller than the other) and thus is a contender for some kinds of uses.

What I wanted to explore and now express is that finding the upper growth rate of an algorithm is just part of the story. Sometimes the best implementation fails for not adapting to the real input data. I will say this, though, for the default algorithm: it works with IEnumerables, since it never needs to jump forward over some elements. This intuitively gives me reason to believe that it could be optimized using the array/list indexing. Here it is, in IEnumerable fashion:
private static IEnumerable<int> intersect(IEnumerable<int> arr1, IEnumerable<int> arr2)
{
    var e1 = arr1.GetEnumerator();
    var e2 = arr2.GetEnumerator();
    var loop = e1.MoveNext() && e2.MoveNext();
    while (loop)
    {
        var v1 = e1.Current;
        var v2 = e2.Current;
        switch (v1.CompareTo(v2))
        {
            case -1:
                loop = e1.MoveNext();
                break;
            case 0:
                loop = e1.MoveNext() && e2.MoveNext();
                yield return v1;
                break;
            case 1:
                loop = e2.MoveNext();
                break;
        }

    }
}

Extra work


The source code for a project that tests my various ideas can be found on GitHub. There you can find the following algorithms:
  • Standard - the O(m+n) one described above
  • Reverse - same, but starting from the end of the arrays
  • Binary Search - looks for values in the other array using binary search. Complexity O(m*log(n))
  • Smart Choice - when m*log(n)<m+n, it uses the binary search, otherwise the standard one
  • Accelerating - the one that speeds up when looking for values
  • Divide et Impera - recursive algorithm that splits arrays by choosing the middle value of one and binary searching it in the other. Due to the complexity of the recursiveness, it can't be taken seriously, but sometimes gives surprising results
  • Middle out - it takes the middle value of one array and binary searches it in the other, then uses Standard and Reverse on the resulting arrays
  • Pair search - I had high hopes for this, as it looks two positions in front instead of one. Really good for some cases, though generally it is a bit more than Standard

The testing tool takes all algorithms and runs them on randomly generated arrays:
  1. Lengths m and n are chosen randomly from 1 to 1e+6
  2. A random number s of up to 100 "spikes" is chosen
  3. m and n are split into s+1 equal parts
  4. For each spike a random integer range is chosen and filled with random integer values
  5. At the end, the rest of the list is filled with any random values

Results


For really small first array, the Binary Search is king. For equal size arrays, usually the Standard algorithm works wins. However there are plenty of cases when Divide et Impera and Pair Search win - usually not by much. Sometimes it happens that Accelerating Search is better than Standard, but Pair Search wins! I still have the nagging feeling that Pair Search can be improved. I feel it in my gut! However I have so many other things to do for me to dwell on this.

Maybe one of you can find the solution! Your mission, should you choose to accept it, is to find a better algorithm for intersecting sorted arrays than the boring standard one.

Thursday, August 11, 2016

Finding simultaneously the minimum and maximum, optimized

While reading the book Introduction to Algorithms, Third Edition, by Thomas H. Cormen and Charles E. Leiserson, I found a little gem about simultaneously finding the minimum and maximum value in an array in 3*n/2 comparisons instead of the usual 2n. The trick is to take two numbers at a time, compare them with each other and only then compare the smallest one with the minimum and the largest with the maximum.

So instead of:
var min=int.MaxValue;
var max=int.MinValue;
for (var i=0; i<arr.Length; i++) {
   var val=arr[i];
   if (val>max) max=val;
   if (val<min) min=val;
}
you can use this:
var min=int.MaxValue;
var max=int.MinValue;
for (var i=0; i<arr.Length-1; i+=2) {
   var v1=arr[i];
   var v2=arr[i+1];
   if (v1>v2) {
      if (v1>max) max=v1;
      if (v2<min) min=v2;
   } else {
      if (v2>max) max=v2;
      if (v1<min) min=v1;
   }
}
if (arr.Length%2==1) {
   var v=arr[arr.Length-1];
   if (v>max) max=v;
   if (v<min) min=v;
}

In the first case, we take all n values and compare them with the min and max values respectively, so n times 2. In the second example we take every two values (so n/2 times), compare them with each other (1 comparison) and then we compare the smaller value with min and the larger with max (another 2 comparisons), with a combined number of comparisons of n/2 times 3 (plus 2 extra ones if the number of items in the array is odd).

Update: Here is a variant for an IEnumerable<int>, the equivalent of a foreach:
var enumerator = enumerable.GetEnumerator();
var min = int.MaxValue;
var max = int.MinValue;
while (enumerator.MoveNext()) {
    var v1 = enumerator.Current;
    var v2 = enumerator.MoveNext() ? enumerator.Current : v1;
    if (v1 > v2)
    {
        if (v1 > max) max = v1;
        if (v2 < min) min = v2;
    }
    else
    {
        if (v2 > max) max = v2;
        if (v1 < min) min = v1;
    }
}

Saturday, August 06, 2016

Array.ConvertAll - the day I found about it was the day I learned I could not use it anymore

I found these cool websites where you can solve software challenges. Completely randomly I find out about this method that I found pretty cool, called Array.ConvertAll. Imagine you want to transform a string containing a space separated list of integers into an actual list of integers. You would use Array.ConvertAll(line.Split(' '),int.Parse). That is it. I liked the simplicity of it and also the fact that it works out of the box without having to import any namespace. The same thing can be achieved with LINQ thus: line.Split(' ').Select(int.Parse).ToArray(), but you need to import the System.Linq namespace.

Unfortunately, the same day I found about this method that I had never used before yet is there since .NET 2.0, I noticed that it doesn't exist in .NET Core. Like a butterfly, it only lived one day in my development repertoire.

Tuesday, August 02, 2016

Learning ASP.Net MVC - Part 4 - Entity Framework Fundamentals

Learning ASP.Net MVC series:
  1. Setup
  2. MVC Concepts
  3. Authentication
  4. Entity Framework Fundamentals

EF logo The previous version of Entity Framework was 6 and the current one is Entity Framework Core 1.0, although for a few years they have been going with Entity Framework 7. It might seem that they changed the naming to be consistent with .NET Core, but according to them they did it to avoid confusion. The new version sprouted from the idea of "EF everywhere", just like .Net Core is ".Net everywhere", and is a rewrite - a port, as they chose to call it, with some extra features but also lacking some of the functionality EF6 had - or better to say has, since they continue to support it for .NET proper. In this post I will examine the history and some of the basic concepts related to working with Entity Framework as opposed to a more direct approach (like opening System.Data.SqlConnection and issuing SqlCommands).

Entity Framework history


Entity Framework started as an ORM, a class of software that abstracts database access. The term itself is either a bit obsolete, with the advent of databases that call themselves non relational, or rebelliously exact, recognizing that anything that can be called a database needs to also encode relationships between data. But that's another topic altogether. When Entity Framework was designed it was all about abstracting SQL into an object oriented framework. How would that work? You would define entities, objects that inherited from a EntityBase class, and decorate their properties with attributes defining some restrictions that databases have, but objects don't, like the size of a field. You also had some default methods that could be overridden in order to control very specific custom requirements. In the background, objects would be mapped to tables, their simple properties to columns and their more complex properties to other tables that had a foreign key relationship with the owner object mapped table.

There were some issues with this system that quickly became apparent. With the data layer separation idea going strong, it was really cumbersome and ugly to move around objects that inherited from an entire hierarchy of Entity Framework classes and held state in ways that were almost opaque to the user. Users demanded the use of POCOs, a way to separate the functionality of EF from the data objects that were used through all the tiers of the application. At the time the solution was mostly to use simple objects within your application and then translate them to data access objects which were entities.

Microsoft also recognized this and in further iterations of EF, they went full POCO. But this enabled them to also move from one way of thinking to another. At the beginning the focus was on the database. You had your database structure and your data access layer and you wanted to add EF to your project, meaning you needed to map existing tables to C# objects. But now, you could go the other way around. You started with an application using plain objects and then just slapped EF on and asked it to create and maintain the database. The first way of thinking was coined "database first" and the other "code first".

In seven iterations of the framework, things have been changed and updated quite a lot. You can imagine that successfully adapting to legacy database structures while seamlessly abstracting changes to that structure and completely managing the mapping of objects to database was no easy. There were ups and downs, but Microsoft stuck with their guns and now they are making the strong argument that all your data manipulation should be done via EF. That's bold and it would be really stupid if Entity Framework weren't a good product they have full confidence in. Which they do. They moved from a framework that was embedded in .NET, to one that was partially embedded and then some extra code was separate and then, with EF6, they went full open source. EF Core is also open source and .NET Core is free of EF specific classes.

Also, EF Core is more friendly towards non relational databases, so you either consider ORM an all encompassing term or EF is no longer just an ORM :)

In order to end this chapter, we also need to discuss alternatives.

Ironically, both the ancestor and the main competitor for Entity Framework was LINQ over SQL. If you don't know what LINQ is, you should take the time to look it up, since it has been an integral part of .NET since version 3.5. in Linq2Sql you would manually map objects to tables, then use the mapping in your code. The management of the database and of the mapping was all you. When EF came along, it was like an improvement over this idea, with the major advantage (or flaw, depending on your political stance) that it handled schema mapping and management for you, as much as possible.

Another system that was and is very used was separating data access based on intent, not on structure. Basically, if you had the need to add/get the names of people from your People table, you would have another project that had some object hierarchy that in the end had methods for AddPeople and GetPeople. You didn't need to delete or update people, you didn't have the API for it. Since the intent was clear, so was the structure and the access to the database, all encapsulated - manually - into this data access layer project. If you wanted to get people by name, for example, you had to add that functionality and code all the intermediary access. This had the advantage (or flaw) that you had someone who was good with databases (and a bit with code) handling the maintenance of the data access layer, basically a database admin with some code writing permissions. Some people love the control over the entire process, while others hate that they need to understand the underlying database in order to access data.

From my perspective, it seems as there is an argument between people who want more control over what is going on and people who want more ease of development. The rest is more an architectural discussion which is irrelevant as EF is concerned. However, it seems to me that the Entity Framework team has worked hard to please both sides of that argument, going for simplicity, but allowing very fine control down the line. It also means that this blog post cannot possibly cover everything about Entity Framework.

Getting started with Entity Framework


So, how do things look in EF Core 1.0? Things are still split down the middle in "code first" and "database first", but code first is the recommended way for starting new projects. Database first is something that must be supported in perpetuity just in case you want to migrate to EF from an existing database.

Database first


Imagine you have tables in an SQL server database. You want to switch to EF so you need to somehow map the existing data to entities. There is a tutorial for that: ASP.NET Core Application to Existing Database (Database First), so I will just quickly go over the essentials.

First thing is to use NuGet to install EF in your project:
Install-Package Microsoft.EntityFrameworkCore.SqlServer
and then add
"Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
to the project.json tools section. For the Database First approach we also need other stuff like:
Install-Package Microsoft.EntityFrameworkCore.Tools –Pre
Install-Package Microsoft.EntityFrameworkCore.SqlServer.Design
Final touch, running
Scaffold-DbContext "<Sql connection string>" Microsoft.EntityFrameworkCore.SqlServer -OutputDir Models

At this time alarm bells are sounding already. Wait! I only gave it my database connection string, how can it automagically turn this into C# code and work?

If we look at the code to create the sample database in the tutorial above, there are two tables: Blog and Post and they are related via primary key/foreign key as is recommended to create an SQL database. Columns are clearly defined as NULL or NOT NULL and the size of text fields is conveniently Max.



The process created some interesting classes. Besides the properties that map to fields, the Blog class has a property of type ICollection<Post> which is instantiated with a HashSet<Post>. The real fun is the BloggingContext class, which inherits from DbContext and in the override for ModelCreating configures the relationships in the database.
  • Enforcing the required status of the blog Url:
    modelBuilder.Entity<Blog>(entity =>
    {
      entity.Property(e => e.Url).IsRequired();
    });
  • Defining the one-to-many relationship between Blog and Post:
    modelBuilder.Entity<Post>(entity =>
    {
      entity.HasOne(d => d.Blog)
        .WithMany(p => p.Post)
        .HasForeignKey(d => d.BlogId);
    });
  • Having the root sets used to access entities:
    public virtual DbSet<Blog> Blog { get; set; }
    public virtual DbSet<Post> Post { get; set; }

First thing to surprise me, honestly, is that the data model classes are as bare as possible. I would have expected some attributes on the properties defining their state as required, for example. EF Core allows to not pollute the classes with data annotations, as well as an annotation based system. The collections are interfaces and they are only instantiated with a concrete implementation in the constructor. An interesting choice for the collection type is HashSet. As opposed to a List it does not allow access via indexers, only enumerators. It is designed to optimize search: basically finding an item in the hashset does not depend on the size of the collection. Set operations like union and intersects can be used efficiently with Hashset, as well.

Hashset also does not allow duplicates and that may cause some sort of confusion. How does one define a duplicate? It uses IEqualityComparer. However, a HashSet can be instantiated with a custom IEqualityComparer implementation. Alternately, the Equals and GetHashCode methods can be overridden in the entities themselves. People are divided over whether one should use such mechanisms to optimize Entity Framework functionality, but keep in mind that normally EF would only keep in memory stuff that it immediately needs. Such optimizations are more likely to cause maintainability problems than save processing time.

Database first seems to me just a way to work with Entity Framework after using a migration tool. It sounds great, but there are probably a lot of small issues that one has to gain experience with when dealing with real life databases. I will blog about it if I get to doing something like this.

Code first


The code first tutorial goes the other direction, obviously, but has some interesting differences that tell me that a better model of migrating even existing databases is to start code first, then find a way to migrate the data from the existing database to the new one. This has the advantage that it allows for refactoring the database as well as provide some sort of verification mechanism when comparing the old with the new structure.

The setup is similar: use NuGet to install EF in your project:
Install-Package Microsoft.EntityFrameworkCore.SqlServer
then add
"Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
to the project.json tools section.

Then we create the models: a simple DbContext inheritance, containing DbSets of Blog and Post, and the data models themselves: Blog and Post. Here is the code:
public class BloggingContext : DbContext
{
    public BloggingContext(DbContextOptions<BloggingContext> options)
        : base(options)
    { }

    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }
}

public class Blog
{
    public int BlogId { get; set; }
    public string Url { get; set; }

    public List<Post> Posts { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }

    public int BlogId { get; set; }
    public Blog Blog { get; set; }
}

Surprisingly, the tutorial doesn't go into any other changes to this code. There are no HashSets, there are no restrictions over what is required or not and how the classes are related to each other. A video demo of this also shows the created database and it contains primary keys. A blog has a primary key on BlogId, for example. To me that suggests that convention over configuration is also used in the background. The SomethingId property of a class named Something will automatically be considered the primary key (also simply Id). Also, if you look in the code that EF is executing when creating the database (these are called migrations and are pretty cool, I'll discuss them later in the post) Blogs are connected to Posts via foreign keys, so this thing works wonders if you name your entities right. I also created a small console application to test this and it worked as advertised.

Obviously this will not work with every scenario and there will be attributes attached to models and novel ways of configuring mapping, but so far it seems pretty straightforward. If you want to go into the more detailed aspects of controlling your data model, try reading the documentation provided by Microsoft so far.

Entity Framework concepts


We could go right into the code fray, but I choose to first write some more boring conceptual stuff first. Working with Entity Framework involves understanding concepts like persistence, caching, migrations, change saving and the underlying mechanisms that turn code into SQL, of the Unit of Work and Repository patterns, etc. I'll try to be brief.

Context


As you have seen earlier, classes inheriting from DbContext are the root of all database access. I say classes, because more of them can be used. If you want to copy from one database to another you will need to contexts. The context defines a data model, differentiated from a database schema by being a purely programmatic concept. DbContext implements IDisposable so for nuclear operations it can be used just as one uses an open SQL connection. In fact, if you are tempted to reuse the same context remember that its memory use increases with the quantity of data it accesses. It is recommended for performance reasons to immediately dispose a context when finishing operations. Also, a DbContext class is not thread safe. It stands to reason to use context for as short a period as possible inside single threaded operations.

DbContext provides two hooks called OnConfiguring and OnModelCreating that users can override to configure the context and the model, respectively. Careful, though, one can configure the context to use a specific implementation of IModel as model, in which case OnModelCreating will not be called. The other most important functionality of DbContext is SaveChanges, which we will discuss later. Worth mentioning are Database and Model, properties that can be used to access database and model information and metadata. The rest are Add, Update, Remove, Attach, Find, etc. plus their async and range versions allowing for the first time - EF6 did not - to dynamically send an object to a function like Add for example and let EF determine where to add it. It's nothing that sounds very safe to use, but probably there were some scenarios where it was necessary.

DbSet


For each entity type to be accessed, the context should have DbSet<TEntity> properties for that type. DbSet allows for the manipulation of entities via methods like Add, Update, Remove, Find, Attach and is an IEnumerable and IQueriable of TEntity. However, in order to persist any change, SaveChanges needs to be called on the context class.

SaveChanges


The SaveChanges method is the most important functionality of the context class, which otherwise caches the accessed objects and their state waiting either for this method to be called or for the context to be disposed. Important improvements on the EF Core code now allows to send these changes to the database using batches of commands. Before, in EF6 and previously, each change was sent separately so, for example, adding two entities to a set and saving changes would do two database trips. From EF Core onward, that would only take one trip unless specifically configured with MaxBatchSize(number). Revert to the EF6 behavior using MaxBatchSize(1). This applies to SqlServer only so far.

This behavior is the reason why contexts need to be released as soon as their work is done. If you query all the items with a name starting with 'A', all of these items will be loaded in the context memory. If you then need to get the ones starting with 'B', the performance and memory will be affected if using the same context. It might be helpful, though, if then you need to query both items starting with 'A' and the ones starting with 'B'. It's your choice.

One particularity of working with Entity Framework is that in order to update or delete records, you first need to query them. Something like
context.Posts.RemoveRange(context.Posts.Where(p => p.Title.StartsWith("x")));
There is no .RemoveRange(predicate) because it would be impossible to resolve a query afterwards. Well, not impossible, only it would have to somehow remember the predicate, alter subsequent selects to somehow gather all information required and apply deletion on the client side. Too complicated. There is a way to access the database by writing SQL directly and again EF Core has some improvements for this, but raw SQL changes are opaque to an already existing context.

Unit of Work and Repository patterns


The Repository pattern is an example of what I was calling before an alternative to Entity Framework: a separation of data access from business logic that improves testability and keeps distinct responsibilities apart. That doesn't mean you can't do it with EF, but sometimes it feels pretty shallow and developers may be tempted to skip this extra encapsulation.

A typical example is getting a list of items with a filter, like blog posts starting with something. So you create a repository class to take over from the Posts DbSet and create a method like GetPostsStartingWith. A naive implementation returns a List of items, but this actually hinders EF in what it tries to do. Let's assume your business logic requires you to return the first ten posts starting with 'A'. The initial code would look like this:
var posts=context.Posts.Where(p=>p.Title.StartsWith("A")).Take(10).ToList();
In this case the SQL code sent to the database is like SELECT TOP 10 * FROM Posts WHERE Title LIKE 'A%'. However, in a code looking like this:
var repo=new PostsRepository();
var posts=repo.GetPostsStartingWith("A").Take(10).ToList();
will first pull all posts starting with "A" then retrieve the first 10. Ouch! The solution is to return IQueryable instead of IEnumerable or a List, but then things start to feel fishy. Aren't you just shallow proxying the DbSet?

Unit of Work is some sort of encapsulation of similar activities using the same data, something akin to a transaction. Let's assume that we store the number of posts in the Counts table. So if we want to add a post we need to do the adding, then change the value of the count. The code might look like this:
var counts=new CountsRepository();
var blogs=new BlogRepository();
var blog=blogs.Where(b.Name=="Siderite's Blog").First();
blog.Posts.Add(post);
counts.IncrementPostCount(blog);
blog.Save();
counts.Save();
Now, since this selects a blog and changes posts then updates the counts, there is no reason to use different contexts for the operation. So one could create a Unit of Work class that would look a bit like a common repository for blogs and counts. Let's ignore the silly example as well as the fact that we are doing posts operations using the BlogRepository, which is something that we are kind of forced to do in this situation unless we start to deconstruct EF operations and recreate them in our code. There is a bigger elephant in the room: there already exists a class that encapsulates access to database, caches the items retrieved and creates one atomic operation for both changes. It's the context itself! If we instantiate the repositories with a constructor that accepts a context, then all one has to do to atomize the operations is to put the code inside a using block.

There are also controversies related to the use of these two patterns with EF. Rob Conery has a nice blog post suggesting Command/Query objects instead. His rationale is that if you have to pass a context object, as above, there is no much decoupling involved.

I lean towards the idea that you need a Data Access Layer encapsulation no matter what. I would put the using block in a method in a class rather than pass the context or not use a repository. Also, since we saw that entity type is not a good separation of "repositories" - I feel that I should name them differently in this situation - and the intent of the methods is already declared in their name (like GetPosts...) then these encapsulation classes should be separated by some other criteria, like ContentRepository and ForumRepository, for example.

Migrations


Migrations are cool! The idea is that when making changes to structure of the database one can extract those changes in a .cs file that can be added to the project and to source control. This is one of the clear advantages of using Entity Framework.

First of all, there are a zillion tutorials on how to enable migrations, most of them wrong. Let's list the possible ways you could go wrong:
  • Enable-Migrations is obsolete - older tutorials recommended to use the Package Manager Console command Enable-Migrations. This is now obsolete and you should use Add-Migration <Name>
  • Trying to install EntityFramework.Commands - due to namespace changes, the correct namespace would be Microsoft.EntityFrameworkCore.Commands anyway, which doesn't exist. EntityFramework.Commands is version 7, so it shouldn't be used in .NET Core. However, at one point or another, this worked if you added some imports and changed stuff around. I tried all that only to understand the sad truth: you should not install it at all!
  • Having a DbContext inheriting class that doesn't have a default constructor or is not configured for dependency injection - the migration tool looks for such classes then creates instances of them. Unless it knows how to create these instances, the Add-Migration will fail.

The correct way to enable migrations is... to install the packages from the Database First section! Yes, that is right, if you want migrations you need to install
Install-Package Microsoft.EntityFrameworkCore.Tools –Pre
Install-Package Microsoft.EntityFrameworkCore.SqlServer.Design
Only then you may open the Package Manage Console and run
Add-Migration FirstMigration
Note that I am discussing an SQL Server example. It is possible you will need other packages if using a different type of database.

The result is a folder called Migrations in which you will find two files: a snapshot and the migration itself. Here is an example of the snapshot:
[DbContext(typeof(BloggingContext))]
partial class BloggingContextModelSnapshot : ModelSnapshot
{
    protected override void BuildModel(ModelBuilder modelBuilder)
    {
        modelBuilder
        .HasAnnotation("ProductVersion", "1.0.0-rtm-21431")
        .HasAnnotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn);

        modelBuilder.Entity("EFCodeFirst.Blog", b =>
        {
            b.Property<int>("BlogId")
            .ValueGeneratedOnAdd();

            b.Property<string>("Url");

            b.HasKey("BlogId");

            b.ToTable("Blogs");
        });

        modelBuilder.Entity("EFCodeFirst.Post", b =>
        {
            b.Property<int>("PostId")
            .ValueGeneratedOnAdd();

            b.Property<int>("BlogId");

            b.Property<string>("Content");

            b.Property<string>("Title");

            b.HasKey("PostId");

            b.HasIndex("BlogId");

            b.ToTable("Posts");
        });

        modelBuilder.Entity("EFCodeFirst.Post", b =>
        {
            b.HasOne("EFCodeFirst.Blog", "Blog")
            .WithMany("Posts")
            .HasForeignKey("BlogId")
            .OnDelete(DeleteBehavior.Cascade);
        });
    }
}

And here is one of the migration:
public partial class First : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.CreateTable(
name: "Blogs",
columns: table => new
        {
            BlogId = table.Column<int>(nullable: false)
            .Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
            Url = table.Column<string>(nullable: true)
        },
constraints: table =>
        {
            table.PrimaryKey("PK_Blogs", x => x.BlogId);
        });

        migrationBuilder.CreateTable(
name: "Posts",
columns: table => new
        {
            PostId = table.Column<int>(nullable: false)
            .Annotation("SqlServer:ValueGenerationStrategy", SqlServerValueGenerationStrategy.IdentityColumn),
            BlogId = table.Column<int>(nullable: false),
            Content = table.Column<string>(nullable: true),
            Title = table.Column<string>(nullable: true)
        },
constraints: table =>
        {
            table.PrimaryKey("PK_Posts", x => x.PostId);
            table.ForeignKey(
name: "FK_Posts_Blogs_BlogId",
column: x => x.BlogId,
principalTable: "Blogs",
principalColumn: "BlogId",
onDelete: ReferentialAction.Cascade);
        });

        migrationBuilder.CreateIndex(
name: "IX_Posts_BlogId",
table: "Posts",
column: "BlogId");
    }

    protected override void Down(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.DropTable(
name: "Posts");

        migrationBuilder.DropTable(
name: "Blogs");
    }
}

Note that this is not something that copies the changes in data, only the ones in the database schema.

Conclusions


Yes, no code in this post. I wanted to explore Entity Framework in my project, but if I would have continued it like that the post would have become too long. As you have seen, there are advantages and disadvantages in using Entity Framework, but at this point I find it more valuable to use it and meet any problems I find face on. Besides, the specifications of my project don't call for complex database operations so the data access mechanism is quite irrelevant.

Stay tuned for the next post in which we actually use EF in ContentAggregator!

The Untold History of the United States, by Oliver Stone and Peter Kuznick

Cover About 25 years ago I was getting Compton's Multimedia Encyclopedia CD-ROM as a gift from my father. Back then I had no Internet so I delved into what now seems impossibly boring, looking up facts, weird pictures, reading about this and that.

At one time I remember I found a timeline based feature that showed on a scrolling bar the main events of history. I am not much into history, I can tell you that, but for some reason I became fascinated with how events in American history in particular were lining up. So I extracted only those and, at the end, I presented my findings to my grandmother: America was an expanding empire, conquering, bullying, destabilizing, buying territory. I was really adamant that I had stumbled onto something, since the United States were supposed to be moral and good. Funny how a childhood of watching contraband US movies can make you believe that. My grandmother was not impressed and I, with the typical attention span of a child, abandoned any historical projects in the future.

Fast forward to now, when, looking for Oliver Stone to see what movies he has done lately, I stumble upon a TV Series documentary called The Untold History of the United States. You can find it in video format, but also as a companion book or audio book. While listening to the audio book I realized that Stone was talking about my childhood discovery, also disillusioned after a youth of believing the American propaganda, then going through the Vietnam war and realizing that history doesn't tell the same story as what is being circulated in classes and media now.

However, this is no childish project. The book takes us through the US history, skirting the good stuff and focusing on the bad. Yet it is not done in malice, as far as I could see, but in the spirit that this part of history is "untold", hidden from the average eye, and has to be revealed to all. Stone is a bit extremist in his views, but this is not a conspiracy theory book. It is filled with historical facts, arranged in order, backed by quotes from the people of the era. Most of all, it doesn't provide answers, but rather questions that the reader is invited to answer himself. Critics call it biased, but Stone himself admits that it is with intent. Other materials and tons of propaganda - the history of which is also presented in the book - more than cover the positive aspect of things. This is supposed to be a balancing force in a story that is almost always said from only one side.

The introductory chapter alone was terrifying, not only because of the forgotten atrocities committed by the US in the name of the almighty dollar and God, but also because of the similarities with the present. Almost exactly a century after the American occupation of the Philippines, we find the same situation in the Middle-East. Romanians happy with the US military base at Deveselu should perhaps check what happened to other countries that welcomed US bases on their territory. People swallowing immigration horror stories by the ton should perhaps find out more about a little film called Birth of a Nation, revolutionary in its technical creation and controversial - now - for telling the story of the heroic Ku-Klux-Klan riding to save white folk - especially poor defenseless women - from the savage negroes.

By no means I am calling this a true complete objective history, but the facts that it describes are chilling in their evil banality and unfortunately all true. The thesis of the film is that America is losing its republican founding fathers roots by behaving like an empire, good and moral only in tightly controlled and highly financed media and school curricula. It's hard not to see the similarities between US history a century ago and today, including the presidential candidates and their speeches. The only thing that has changed is the complete military and economic supremacy of the United States and the switch from territorial colonialism to economic colonialism. I am not usually interested in history, but this is a book worth reading.

I leave you with Oliver Stone's own introduction to the series:

Monday, August 01, 2016

Infrastructure (Accessor) pattern - an interesting design choice for the .NET Core team

While researching the new .NET Core features and functionalities I've stumbled upon this pattern for hiding functionality, but also making it accessible when needed.

There was a long history of Microsoft writing the code as closed as possible: classes and interfaces are internal protected and sealed and all that jazz. If you have ever tried to copy paste Microsoft .NET source code into your project, in order to modify it to your needs, you know what I mean. More times than not I gave up because of the immense chain of dependencies that had to be all copy pasted in order for a small piece of code to work.

Well, .NET Core is now open source and there is a strong current of moving away from such practices. One pattern that drew my attention is the IInfrastructure<T> interface and pattern used in EntityFramework. Basically, instead of exposing rarely used members directly, you hide them within a generic interface that can be retrieved at will.

Yes, it is possible to do the same thing with an explicitly implemented interface, but this is more of a two step way of doing it (and also of uncluttering class signatures). The concrete example is DbContext, which is an explicit implementation of IInfrastructure<IServiceProvider>. IService provider has a GetService<T> method that returns specific implementation of interfaces of base classes. Then, with a nice extension method called GetInfrastructure<T>, one can get the service provider. For example one can retrieve the relational type mapper from a context using:
var serviceProvider=context.GetInfrastructure();
var mapper=serviceProvider.GetService<IRelationalTypeMapper>();

I find it interesting as a general pattern, allowing one to expose innumerable interface signatures without inheriting from them all. A class can enable any number of mechanisms for discovery and execution simply by implementing its own service provider. Moreover, if there is some sort of general Service Locator pattern in place, classes can locally override that mechanism while leaving the rest in place. Clearly there is potential for abuse, but I also see it as a way to clearly represent and separate concerns.