top of page

The Iterative Prisoner’s Dilemma


One of the more popular Game Theory ideas is the Prisoner’s Dilemma. Essentially this simple game has two players representing criminals who are separated and offered a choice. Either they can co-operate with their criminal partner by telling the police nothing or they can defect by turning the other criminal in. If they both co-operate the police can only get them on a minor charge and they both get one year in prison, if they both defect they each get ten years, but if one co-operates and the other defects, the defector gets off scot free while the co-operator gets 40 years. The result of this is that it’s always better to defect, because there are two possibilities, either your partner is co-operating in which case by defecting you go from getting one year to getting nothing, or your partner is defecting in which case you go from getting 40 years to getting ten.


What’s weird is that this is pretty obvious and everyone realises it, so everyone will defect, meaning that in a smart population everyone will be getting ten years, because that’s the best tactic. This unhappy situation is known as Nash’s Equilibrium. Meanwhile in a theoretical population of loyal dummies everyone is getting one year, despite their being totally wrong in their choices, the loyal stupid fools and their short sentences and general freedom. The fact that this shows what is clearly the worst tactic to be clearly the best tactic is therefore a little odd, and has led to an extension of the Prisoner’s Dilemma which seems to have been felt to clear up that weirdness, but has never sat entirely right with me, namely the Iterative Prisoner’s Dilemma.


The Iterative Prisoner’s Dilemma is, quite simply, that players engage in repeated rounds of the Prisoner’s Dilemma rather than just a single round. The idea being that if you can predict what your opponent is going to do over multiple rounds you’d do better and that process of prediction could be interesting. This was made physical with an iterative prisoner’s dilemma tournament for computer programs to test out the theory, what is considered interesting is that the winning program in that experiment was the shortest. It was a program called Tit for Tat and it always co-operated in round 1, then copied what the opponent did in all the other rounds. In fact, all the winning programs contained three qualities, they were never the first to defect, once defected on they tried to retaliate and they went back to co-operation whenever possible.


This has been interpreted as a program that is ‘nice’ since it never defects first, that ‘stands up for itself’ because it always seeks retaliation and is ‘forgiving’ since it always returns to a state of co-operation. As such the conclusion often made is that it’s a way that we can see how positive qualities can develop in a mutually beneficial manner during repeated social interactions, nice, decent co-operative people will inevitably rise to the top of any given society as we all know and witness in our daily lives. Right?


It won’t shock you to find that this idea has never sat entirely well with me. As a designer I think that’s because it seems to ignore the process of meta-game development that occurs in tournament scenes and it does that with a fairly hand wavy bit of a statement. It is generally said that an Iterative Prisoner’s Dilemma tournament has to have rounds of random lengths, which is entirely true. The reason for this is that Tit for Tat is totally vulnerable to the Sucker Puncher, which is to say, if the rounds are ten iterations long, the Sucker Puncher plays as Tit for Tat and then always defects in round ten. The Sucker Puncher would therefore clearly win, as we know, in the meta-game of tournaments a dominant strategy will quickly have a strategy built to defeat it, and there is only one strategy that can beat the Sucker Puncher, the Sucker Puncher who defects in rounds nine and ten, and quickly we end up with a situation where all programs defect all the time, we return to Nash’s Equilibrium.


So, the rounds have to be of random length. The question then of the tournament game designer becomes, do you mean entirely random? If the intention is that some rounds will take one iteration while others take hundreds, then the winner of the tournament will be largely random and determined by the number of iterations played, with the winner almost certainly being the program with the fewest total iterations. In fact, if the rounds are truly random then in a large enough sample pool somewhere you will have simply re-created the original single iteration dilemma. If the years taken are cumulative then clearly the lowest number of iterations will win. If the years taken are averaged then programs with low rounds will still win because their averages will tend towards their first-round scores, which are somewhere between zero and ten for defectors, between one and 40 for co-operators. Assuming a random spread of defectors and co-operators the winner using averaged scores should be a defector playing very few rounds, again.


Of course, stating that the length of rounds should be random is something that can only be made as a theoretical threat, even with computer programs its clear that if you want a result the length of rounds cannot be truly random. If a program becomes a Learning Sucker Puncher, for example. The Learning Sucker Puncher always defects on iteration ten during the first round and in later rounds it defects on the latest iteration that it has so far experienced a round going to. Over a sufficient number of rounds it will eventually learn the actual upper limit of the number of iterations that your tournament allows, and it will always beat a Tit for Tat program. Any Sucker Punch program will become dominant, until a Sucker Punch Plus One program is introduced, and all programs will tend back to the Nash Equilibrium of total defection.


The Iterative Prisoner’s Dilemma talks about modelling how intelligent selfish behaviour can result in beneficial social activity; I don’t believe that is what it does. To me it proves that all attempts to allow widespread selfish behaviour ultimately result in behaviours that are destructive to people and society as a whole. “Smart” self-interested behaviour results in long term self-destruction, something we can see modelling to much of modern society. Only through a wider understanding of our position in the overall system that we exist in can we actually active a stable state that is to our greatest benefit due to true co-operation. The only possible result of true competition is the Nash Equilibrium of overall loss, the only way to possibly win is to engage in actual co-operation and cease to dream of the victory that could be gained by stepping on the corpses of your foes, there is no greater fool in any successful system.


Is there an area of game theory you find particularly interesting? Or one where the results have never quite sat right for you? How do you feel about the Prisoner’s Dilemma or its alternate versions such as Chicken or Hawks and Doves?

Comments


bottom of page