Adoption of simultaneous different strategies against different opponents enhances cooperation

The emergence of cooperation has been widely studied in the context of game theory on structured populations. Usually the individuals adopt one strategy against all their neighbors. The structure can provide reproductive success for the cooperative strategy, at least for low values of defection tendency. Other mechanisms, such as punishment, can also be responsible for cooperation emergence. But what happens if the players adopt simultaneously different strategies against each one of their opponents, not just a single one? Here we study this question in the prisoner dilemma scenario structured on a square lattice and on a ring. We show that if an update rule is defined in which the players replace the strategy that furnishes the smallest payoff, a punishment response mechanism against defectors without imputing cost to the punishers appears, cooperation dominates and, even if the tendency of defection is huge, cooperation still remains alive.

The cooperation problem is usually mapped into a game theory framework in which the individuals are players. Usually the player can choose one of the two strategies (cooperation (C) or defection (D)) and use this strategy during a game round. A strategy update can be set, and for the next round, the players can change their strategy taking into account the payoff earned during the round. If the players are set on the vertices of a network, they are constrained to play against their next neighbors. Note that, if some strategy is chosen, the same must be used against all of the opponents during the same round. Recently, social diversity has been introduced [17] in the context of cooperation in public good games. The promotion of cooperation was analyzed by setting each individual to play different games in each neighborhood. This opens the question of what could happen if each player could choose different strategies against different opponents and adopt them during the same round. Let us state this point clearly. Suppose N individuals set on the vertices of a network. Each player can choose different strategies against each neighbor and earn different payoffs from each interaction. If the network is a square lattice, player x can choose, for example, the (C,C,C,C) strategy set, meaning that player x chooses the C strategy against all of his opponents.

38001-p1
In this letter, we study the emergence of cooperation if the players adopt different strategies against different opponents. We use the prisoner dilemma with population structured on a square lattice and on a ring as the scenario for the cooperation problem. The players use different strategies against different neighbors and use the imitation rule with synchronous updating [9]. In the prisoner dilemma [18], both players receive 1 upon mutual cooperation, upon mutual defection; the defector receives b if the other cooperates, and the cooperator receives 0 if the other defects. The tendency to defect is given by b (b > 1), and is taken to be small ( 1). When player x interacts with y, we assume that x has the information on i) y's cumulative payoff and ii) the strategy that y is using against him. In this rule, players imitate the successful strategies of their neighborhoods. Each player interacts with its next neighbors and plays a round of one game with each opponent and earns a cumulative payoff. Then the player, let us call it x, randomly chooses a neighbor, y, and compares both cumulative payoffs P x and P y . If P x P y , player x stays with its own strategy, but if P x < P y , then player x adopts the opponent's strategy with probability (P y − P x )/4b [9]. So if player x decides to copy the strategy that y is using against him, according to the rules stated above, he must also decide against which of his neighbors he will change his strategy by the new one. We have defined three different replacement rules that give rise to three different update rules. Let us call them just models A, B, and C. We are going to focus on model A and use the others to highlight the features of the former.
In model A, player x chooses for the replacement the interaction that gives him the smaller contribution for his cumulative payoff. The possible confrontations are (D,C),(C,C),(D,D), and (C,D), where the first entry is x's strategy and the second is the opponent's strategy. Each one of these possibilities contributes to the cumulative payoff with b, 1, , and 0, respectively. If player x is being exploited, he can neutralize the exploiter by changing (C,D) to (D,D) but keep other possible cooperation. It must be noted that if = 0, both (D,D) and (C,D) contribute with the same amount, so the player randomly chooses one of them. In the usual game, where each player adopts the same strategy against all of his opponents, although the imitation also decreases the exploiter payoff, it cannot be thought of as a punishment act because the imitator is actually trying to improve his own benefits by probably exploiting other cooperators. But in model A, when (C,D) is changed to (D,D), just the exploiter is being punished, the other cooperation are not changed. So model A incorporates into the replacement rule both imitation and punishment without costly mechanisms. It must be mentioned that we are not using punishment as another strategy, like altruistic punishment [16]. There are only two basic strategies: cooperation and defection.
In the second replacement rule, model B, player x changes exactly the strategy he uses against the player he chose to imitate. This model is trivial but interesting, because since two players reach mutual cooperation they remain forever and since an exploiter is punished there is no chance of other exploitation and they will stay forever with mutual defection. In the third replacement rule, model C, the tendency to punish exploiters is smoothed because player x randomly chooses one of his confrontations and changes the strategy used in that confrontation.
We made the simulations on the square lattice with linear size L = 100 and checked that the same results are still valid for L = 200. First, we measured the mean fraction of cooperation (f c ) at a stationary state for different values of the tendency for defection b. The number of cooperation (n c ) is taken to be the quantity of C strategies used in all confrontations, so 0 n c 4N and f c = n c /(4N ). The random initial configuration is characterized by 50% of cooperation. For each initial condition, we discarded the transient time needed to reach the stationary state, and then we made a time average for 1000 MCS (Monte Carlo Steps). A second average is realized over 100 different initial conditions. We used 1 b 5 and = 0.01. Figure 1 shows the results for the three models and for the usual game. Note that for model C and the usual game, defection is the dominating strategy when b > b c . Note also that models A and B keep cooperation alive for every b. It must be mentioned that it is possible to have some residues of defection in the case of cooperation domination or some residues of cooperation in the case of defection domination. This is due to the fact that the update rule prevents a change if the payoffs are the same. This prevention enables the existence of some stable residual configurations. For model A, the fraction of cooperation for b = 2 is around f c = 0.9998. For model C, the residues of cooperation is around f c = 0.005 at the region b > b c .
Let us now analyze model A in more detail. In the usual game, extensive analysis has been done and the emergence of cooperators is attributed in part to the negative effect on the cumulative payoff when an exploited individual copies a D strategy used by a 38001-p2 Step) corresponds to a round in which every player play a game with all their neighbors. successful exploiter, decreasing the exploiter payoff [6,19]. This also happens in model A, as can be seen in fig. 2, that depicts the typical behavior for short times. Note that when the quantity of (D,D) is a maximum, the fraction of cooperation is a minimum and starts to increase. Another reason in the usual game for the maintenance of cooperation is the formation of cooperation clusters. This is again present in model A. Suppose we have a cluster as shown in fig. 3, where all the players adopt defection in all confrontations except for a cluster of cooperation. One can see that for b > 1, the region inside the dashed square is stable in the sense that it will never be invaded by a defection. Suppose some player on the interface chooses to change his (C,D) confrontation. Since this confrontation furnishes the worst contribution, it is only that confrontation that can be changed to (D,D). After that, (D,D) is also the worst confrontation, and the D strategy cannot be copied to the interior of the square. If b 3, the cooperation in such configuration cannot spread out, because the exploitation of the immediately outside players give to each of them a total payoff of b > 3, and players immediately inside have a payoff value of only 3. But, if 1 < b < 3, the cooperation cluster can expand. Moreover, a similar analysis applied to configurations like the one depicted in fig. 4 shows that there is always a stable region of cooperation, irrespectively of the boundary and the width of the region of cooperation. For simplicity, let us put = 0 in cluster analysis and evaluate numerically the probability that a square cluster of linear size S can take over a bigger square region of twice the original linear size. For 4 S 10, with probability one the cluster expands. So the two mechanisms of cluster of cooperation formation and the negative effect of the exploiter being imitated are present. But the great difference of model A is the possibility of neutralizing a defector and keeping the old cooperation. The importance of this fact can be better illustrated in a more general case. Suppose a big exploiter is introduced interacting with all individuals of a population of cooperators. After a few rounds, some players adopt D strategy against the big exploiter and still keep the cooperation with the other players. The exploitation will be almost completely neutralized. No matter how great is the exploitation, model A always prevents the invasion of defectors. If b > 3, the strategies on the boundary of the stable region fluctuate. Using a rate equation approach, we obtain an analytical expression for the fraction of cooperation ρ inside a square of size S: (1) Figure 5 shows that the simulation and the analytical expression for S = 6 agree very well. Let us analyze the other models. Model B is a trivial one, because since a player has updated one of his confrontation, this confrontation remains the same forever. Since in the random initial condition we have, in average, 50% of cooperation, there is at the beginning, in average, 25% of each one of the four possible confrontations. So there is an inferior bound for the fraction of cooperators, namely 25%. Since all confrontations are initially equally probable, every player that has more than one (D,C) confrontation will have a bigger cumulative payoff. The players that are exploited by these exploiters 38001-p3 will replace the (C,D)confrontation by a (D,D) one. All these (C,D) confrontations will change to (D,D) ones while this unbalanced exploitations are not neutralized. So the cooperations that remain will be only those that were initially present in (C,C) confrontations, that is, the fraction will be around 25% as is shown in fig. 1. Note that after the exploitations are neutralized, they cannot be converted to mutual cooperation. This feature is in contrast with model A, because if a (D,D) confrontation appears in model A, it can be changed to a (C,C) one when both players x and y imitate other players that are cooperating with them, in the same round.
Model C does not support the same capability of survival for the cooperators, as can be seen in fig. 1. The mechanism of defection neutralization is smoothed, because if there is a (C,D) confrontation, the D strategy can be copied into any confrontation and the exploitation has a probability to remain alive. Let us analyze again the cluster shown in fig. 3. This cluster is no longer stable. If b < 2, the cluster will expand. But there is no guarantee of stability. Suppose some player is reached by the expanding cooperation cluster and changes all of his strategies to C but one is still kept with D. Suppose also that this player is surrounded by cooperation. The cluster can go on expanding, but now this D strategy can start a defection invasion from inside the cluster because it will give to that player a bigger payoff. This does not happen in model A. The random replacement does not provide the certainty of neutralizing an exploitation and still keeping cooperation with the others that are cooperating.
Further simulations show that if b 1 the fraction of cooperation at steady state in model A tends to the same value as that in model B. But this is expected, because if b is huge, every (C,D) confrontation will give the exploiter a huge advantage and every (C,D) confrontation will change to a (D,D) one.
It must be mentioned that the steady-state fraction of cooperation depends on the initial configuration. The parameter we used to vary the initial conditions is the probability p of having a cooperation. For model A, as the cooperation survival depends on cluster formation, if p is small, there is no cluster of cooperation at the beginning, but if p > 0.4 and 1 < b < 3, the stationary value does not depend on the initial condition. When b > 3, cooperation clusters no longer spread out and the stationary values strongly depends on the initial configuration of cooperation reinforcements. After discarding finite-size effects, model C does not depend on initial conditions. Let us now consider another interesting point: the robustness of model A. So far we have assumed that the players identify correctly their worst confrontation. But misjudgment is a relevant parameter in real behavior. Instead of always choosing the worst confrontation, now the players can misjudge and choose a confrontation that does not give the worst contribution to the payoff. For this we introduce a misjudgment probability (p m ). Let us state this point clearly. The individuals can update their strategies in every round. For each individual update we introduce the possibility of a random replacement, given by a misjudgment probability p m . By this definition, with probability 1 − p m , the replacement follows the original model A, and, with probability p m , it follows model C. If p m = 0 and p m = 1, model A and model C are recovered, respectively. We say that model A is robust if the fraction of cooperation in the case p m = 0 still remains close to the fraction in the case p m = 0, if we vary the p m values. We say that for some b value model A is more robust than for another b value, if the decreasing of the cooperation fraction is lower for the first b value. The robustness analysis shows that there are three typical behaviors that depend on the defector tendency. This defines three b regions: i) 1 b < b c , ii) b c b < 3, and iii) b > 3. In the first region both models A and C exhibit cooperation. In the second and third region, just model A exhibits cooperation. Note that the fraction of cooperation in the third region is lower than that in the second. on the cooperation fraction. This means that if N = 10 4 , even in the presence of 1000 misjudgments in every round, nothing happens to the cooperations. For b = 2, model A does not change the fraction of cooperation for p m values up to p m = 0.001. For b = 4, the fraction of cooperation reduces even for p m = 0.0001. Note that for b > b c in model C the inherent randomness on the replacement reduces the cooperation drastically. This b-dependency is also present when some misjudgment introduces randomness in the replacements of model A. So the results support the conclusion that, for low defector tendency, model A is robust, although for large values of b the randomness has a more eminent role in reducing the cooperation fraction.
In order to have a more complete picture of the models with the possibilities of different strategies against different opponents, let us study the one-dimensional lattice with periodic boundary conditions (ring). The ring has N = 10 4 nodes and each node is linked only to their two next neighbors. We also use = 0.01. The fraction of cooperation is shown in fig. 7. Note that for the usual game there is no cooperation and model A is again able to sustain some cooperation. Note that model C is also able to keep some cooperations alive even for huge defector tendency, which does not happen in the square lattice. The influence of connectivity on the cooperation fraction was already noted for the regular network with the usual game [9]. The robustness analysis shows the same b-dependency, although the ring is less robust than the square lattice. Figure 8 shows the robustness simulational results for the ring.
In summary, we have introduced the adoption of multiple strategies by the same player which gives the players the possibility of cooperating with some opponents and of defecting with others. If the player wants to punish the exploiter but keeps the old cooperations, he can use simultaneous multiple strategies with the "replacing the worst confrontation" replacement rule that model A provides. Model A incorporates a true punishment, because the players do not take advantage in imitating the defector strategy, but just decrease the payoff of who wants to be an exploiter. Model A also incorporates the re-establishment of mutual cooperation by not keeping a frozen interaction. The usual game on the square lattice, with the same synchronous update rule, provides a smaller fraction of cooperations for low b values and has a transition at a low b value when the cooperators die out. We have shown that model A provides a mechanism in which cooperation dominates if 1 < b < 3. Moreover, even for huge defection tendency (b > 3) model A ensures cooperation survival.
For the ring topology model A also exhibited capability of sustaining cooperation. Finally, we showed that model A is robust against misjudgments, at least for low defector tendency, for both lattices. We stress that this simple rule can be easily set in other game contexts, which opens the possibility of further researches. * * * The authors thank CNPq and FAPEMIG, Brazilian agencies, and the referees for useful suggestions.