Monday, February 03, 2014

On game Artificial Intelligences and specifically the MinMax algorithm

MinMax or Minimax, as some like to call it, is the basis of most Artificial Intelligence built for games like chess. Its basis is extremely easy to understand: a rational player will try to take the best option available to them, so whatever is good for me the adversary will take as the most likely outcome and he will find the best solution against that outcome. I, following the same pattern, will also look for his best counter move and plan against it. Therefore the thinking for a game of chess, let's say, is that I will take all possible moves, find the one that leaves me with the best position (evaluated by a function from the board position), then look for the similar best play for the adversary. I continue this way until I get to the end of the game or am out of computing resources.

Now, that sounds logical and it's crazy easy to implement. The problem is that for all but the most childish of plays, the tree of all possible moves increases exponentially. And chess isn't even one of the worst games to do that. Imagine Tic-Tac-Toe, a game played on a 3x3 board between two players. You have a total of 9 possible moves to choose from as the first player, then 8, then 7, etc. The entire game tree has a total of 9! possible moves, or 362880. But generalize the game to a board of 10x10 and a winning rule of 5 in a line and you get 100! moves, which is less than 1E+158, that is 10 followed by 158 zeros.

That's why the so called pruning was created, the most common of all being Alpha-Beta, which tries to abort the processing of leaves that seem to reach a worse situation than their parent node. Of course, all of this is the general gist. You might want to take into account a number N best moves from the opponent, as well as try a more lenient pruning algorithm (after all, sacrificing a piece brings you to a worse position than when you started, but it might win the game). All of this increases, not decreases the number of possible moves.

And now comes my thought on this whole thing: how can I make a computer play like a human when the core edict of the algorithm is that all participating players are rational? Humans are rarely so. Mathematically I could take N, the number of best moves I would consider for my opponent, to be the total number of moves my opponent could make, but it would increase the exponential base of the tree of moves. Basically it would make the algorithm think of stupid things all the time.

The pruning algorithm seems to be the most important part of the equation. Indeed, I could consider the move choice algorithm to be completely random and as long as I have a perfect pruning algorithm it will remove all the stupid choices from me and let me with the smart ones. A quote comes to mind: "you reach perfection not when you have nothing else to add, but when there is nothing left to remove". It's appropriate for this situation.

Now, before attacking an algorithm that has survived for so long in the AI industry (and making my own awesome one that will defeat all chess engines in the world - of course, that's realistic) I have to consider the alternative algorithm: the lowly human. How does a human player think in a game of chess? First he surveys the board for any easy wins. That means a broad one or two levels analysis based on a simple board evaluation function. Immediately we get something from this: there might be multiple evaluation functions, we don't need just one. The simple one is for looking for greedy wins, like "He moved his queen where I can capture it, yay!".

The same outcome for situations like this would be achieved by a MinMax algorithm, so we ignore this situation. It gets more interesting from now, though. We look for the moves of the most active pieces. I know that this is the rookie system, but I am a rookie, I will make my computer algorithm be as stupid as I am, if I am to play it, so shut up! The rookie will always try to move his queen to attack something. It's the most powerful piece and it should get the most results for the least effort. We left Greed behind, remember? We are now doing Sloth. Still, with a good pruning algorithm we eliminate stupid Queen moves from the beginning, so considering the Queen first, then Rooks, then Bishops, then Knights, etc. is not a bad idea. The order of the pieces can be changed based on personal preferences as well as well established chess rules, like Knights being better that Bishops in closed games and so on.

This is a small optimization, one that probably most game engines have. And we haven't even touched pruning; boy, this is going to be a long article! Now, what does the human do? He does the depth first tree searches. Well, he doesn't think of them like that, he thinks of them as narrative, but it's basically a depth first search. This is the casual "What if...?" type of play. You move the Queen, let's say, bringing it right in the enemy territory. You don't capture anything important, but to bring a strong piece this uncomfortably near to the enemy king is scary. You don't play for game points, but for emotion points, for special effects, for kicks! You don't abandon the narrative, the linear evolution of your attack, until you find that it bears no fruit. It's the equivalent of the hero running toward the enemy firing his pistol. If the enemy is dumb enough to not take cover, aim carefully and shoot a burst from their SMGs, you might get away with it and it would be glorious. If not, you die idiotically.

It is important to note that in the "Hollywood" chess thinking you are prone to assume that the enemy will make mistakes in order to facilitate your brilliant plan. The evaluation goes as follows: "I will try something that looks cool if the chances for a horrible and immediate loss are small". When some hurdle foils your heroic plan, you make subplans that would, as well as you hope, distract the adversary from your actual target. This, as far as I know, is a typical human reasoning type and I doubt many (if any) computer game engines have it. In computer terms, one would have to define a completely new game, a smaller one, and direct an AI designed specifically for it to tell you if it would work or not. Given the massively parallel architecture of the human brain, it is not hard to understand why we do something like this. But we can do the same with a computer, mind you. I am thinking of something like a customized MinMax algorithm working on few levels, one or two, as the human would. That would result in a choice of N possible moves to make. Then construct a narrative for each, a depth search that just tries to get as much as possible from each move without considering many of the implications. Then assign a risk to each level of this story. If the level exceeds a threshold, use the small range MinMax at those points and try to see if you can minimize the risk or if at that point the risk makes your narrative unlikely.

Let's recap the human thinking algorithm so far:
  1. Try to greedily take what the opponent has stupidly made available
  2. Try to lazily use the strongest piece to get the most result with the least effort
  3. Try to pridefully find the most showy move, the one that would make the best drinking story afterwards
  4. Try to delegate the solving of individual problems in your heroic narrative to a different routine

Wow! Doesn't it seem that the seven deadly sins are built-in features, rather than bugs? How come we enjoy playing with opponents that pretty much go through each of them in order to win more than we do with a rational emotionless algorithm that only does what is right?

Again, something relevant transpires: we take quite a long time imagining the best moves we can make, but we think less of the opponent's replies. In computer terms we would prune a lot more the enemy possible moves than we would our own. In most rookie cases, one gets absorbed by their own attack and ignores moves that could counterattack. It's not intuitive to think that while you are punching somebody, they would choose to punch back rather than avoid the pain. In chess it's a little bit easier and more effective, since you can abandon a piece in order to achieve an overall gain in the game, but it can and it is done in physical combat as well.

Okay, we now have two alternatives. One is the logical one: take into account all the rules chess masters have taught us, shortcuts for achieving a better position on the board; choose moves based on those principles and then gauge the likely response from the opponent. Repeat. This is exactly like a MinMax algorithm! So we won't do that. The hell with it! If I can't enjoy the game, neither will my enemy!!

Human solution: don't do anything. Think of what your opponent would do, if you wouldn't move anything and foil their immediate plan. This way of thinking would be counterintuitive for a computer algorithm. Functioning on the basis of specific game rules, a computer would never be inclined to think "what would the enemy do if I didn't move anything, which is ILLEGAL in chess?". That makes us superior, obviously ;-)

Slowly, but surely, a third component of the algorithm becomes apparent: the move order choice. Let's imagine a naive MinMax implementation. In order to assess every possible move, it would have to enumerate them. If the list of moves is always the same in a certain board position, the game will always proceed the same way. The solution is to take the list of possible moves, but in a random order. In the case of the "human algorithm" the ordering becomes more complex (favouring powerful piece moves, for example). One could even consider the ordering mechanism responsible for choosing whether to do a careful breadth search for each level or a depth first one.

Here is a suggestion for an algorithm, one that takes into account the story of the game and less the objective gain or position strength:
  1. For each of your power pieces - anything but the king and pawns - compute mobility, or the possibility to move and attack. Favour the stronger pieces first.
  2. For each power piece with low mobility consider pawn moves that would maximize that mobility.
  3. For each power piece with high mobility consider the moves that would increase the chance of attack or that would attack directly
  4. For each strong move, consider the obstacles - enemy pieces, own pieces, possible enemy countermeasures
  5. Make the move that enables the considered power move or that foils the enemy attempts of reply

The advantage of this approach is that it only takes into account the enemy when he can do something to stop you, the pawns only when they can enable your devious plan and focuses on ventures that yield the best attack for your heroes. For any obstruction, you delegate the resolution of the problem to a different routine. This makes the algorithm parallelizable as well as modular - something we devs love because we can test the individual parts separately.

This algorithm would still use a board estimation function, but being more focused on heroic attacks, it would prefer interesting move orders to static positions as well as the "fun factor", something that is essential to a human-like algorithm. If the end result of the attack is a check-mate, then it doesn't really matter what position estimate you get when you did half the moves. All one has to wonder is if the attack is going to be successful or not and if one can do something to improve the chances of success. And indeed this is one of the most difficult aspects for a chess playing human: to switch from a failing plan to a successful plan when it is not yet clear is the first plan is failing. We invest energy and thought into an idea and we want it to work. A lot of the chess playing strategy of human rookies relies on prayer, after all. A computer would just assess the situation anew at every move, even if it has a strategy cached somewhere. If the situation demands it, a new strategy will be created and the last one abandoned. It's like killing your child and making another!

But, you will say, all you did so far was to describe an inferior algorithm that can be approximated by MinMax with only custom choices for the pruning and move order functions! You are missing the point. What I am describing is not supposed to beat Grand Masters, but to play a fun game with you, the casual player. More than that, my point is that for different desired results, different algorithms must be employed. This would be akin to creating a different AI for each level of a chess game.

Then there is the issue of the generalized TicTacToe or other games, such as Arimaa, created specially to make it difficult for computer algorithms to play, where MinMax fails completely. To make a comparison to real life, it's like you would consider the career steps you would take in life based on all possible jobs available, imagining what would it be to be employed there, what the difficulties might be, finding solutions to those problems, repeating the procedure. You will get to the conclusion that it is a good idea to become a computer scientist after thoroughly examining and partially understanding what it would be like to be a garbage man, a quantum scientist, a politician and a gigolo, as well as all the jobs in between. Of course, that is not as far fetched as you think, since in order to be a success in software development you must be at least a politician and a garbage man, perhaps even a gigolo. Lucky for our profession, quantum computers are in the works, too.

The same incongruency can be found when thinking of other games humans enjoy, like races. The desired result can only be achieved at the end of the race, when you actually get somewhere. In order to get to that specific point in space, you could consider the individual value of each direction change, or even of each step. However humans do it differently, they specify waypoints that must be achieved in order to get to the finish and then focus on getting from waypoint to waypoint, rather than rethinking the entire course. In computer terms this is a divide-and-conquer strategem, where one tries to solve a problem that has known start and end points by introducing a middle point and then solving the problem from the start to the middle. BTW, this also solves Zeno's paradox: "Why does the arrow reach its target if, at any point in its course, it has at least half the distance left to fly?" and the answer is "Because of the exit condition that prevents a stack overflow". Try to sell that one in a philosophy class, heh heh.

So why aren't chess AIs based on human thinking processes? Why don't they implement a divide and conquer solution for a game that always starts with a specific board position and ends in capturing a specific piece? Why do chess engines lower their "level" by sometimes randomly choosing a completely losing path instead of something that is plausible to choose, even if completely wrong objectively? How can MinMax be the best general algorithm for game AIs, when some of them have a branching factor that makes the use of the algorithm almost useless?

I obviously don't have the answers to these questions, but I may have an opportunity to explore them. Hopefully I will be less lazy than I usually am and invent something completely unscientific, but totally fun! Wish me luck!