Public Goods with Punishment and Payment for Relative Rank

This study contributes to the understanding of the interaction between punishment, cooperation, and aggregate welfare by examining the impact of payoffs based on relative performance. In many non-laboratory settings, people obtain important rewards based on relative performance. Despite the central role played by relative performance in human society, the experimental cooperation literature primarily focuses on settings without any explicit payoff for doing well relative to other people. In this study, monetary rewards for relative performance dramatically increase the level of punishment. The costs of punishment are sufficiently high that aggregate payoffs are dramatically lower with punishment than without punishment. Thus, punishment in this setting is not altruistic because the group is harmed by punishment.

Under some laboratory conditions, with some subject populations, the ability to punish increases the level of cooperation [17,18]. Some scholars label punishment as 'altruistic' when three conditions are met simultaneously: punishing is costly to the punisher, ii) the group outcome is improved because of the ability to punish, and iii) Individual punishers cannot benefit from their decision to punish. "Thus, the act of punishment, although costly for the punisher, provides a benefit to other members of the population by inducing potential non-cooperators to increase their investments. For this reason, the act of punishment is an altruistic act." [17].
This study contributes to the understanding of the interaction between punishment, cooperation, and aggregate welfare by examining the impact of payoffs based on relative performance. In many non-laboratory settings, people obtain important rewards based on relative performance. Despite the central role played by relative performance in human society, the experimental cooperation literature primarily focuses on settings without any explicit payoff for doing well relative to other people. In this study, monetary rewards for relative performance dramatically increase the level of punishment. The costs of punishment are sufficiently high that aggregate payoffs are dramatically lower with punishment than without punishment. Thus, punishment in this setting is not altruistic because the group is harmed by punishment.
The proponents of 'altruistic punishment' argue that human nature has been importantly shaped by group selection [17,[19][20][21]. The claim that humans are group-selected altruists sacrificing themselves for the good of others, if true, would radically alter economics, as well as the social and natural sciences more broadly. The robustness of the altruistic nature of punishment is the subject of several strands of the research. First, in some populations, subjects'' anti-social' punishment is directed toward high cooperators [22,23]. The populations where punishment is directed towards low cooperators may be atypical [24]. Second, 'Counter punishment' studies allow multiple rounds of punishment that can be used to counter punishments received, or to punish others who fail to punish [25][26][27]. In some cases, counter punishment makes punishment inefficient. These results argue that the addition of punishment does not uniformly lead to better group. Third, the power of punishment technology impacts aggregate outcomes. If punishment power is too weak, cooperation may degrade [18]; if punishment power is too strong, the high cost of punishment leads to lower group outcomes [28].
How robust is the 'altruistic' aspect of punishment? Rankbased payoffs provide a promising area to further examine this important question. Outside the laboratory, many or even most situations include important rewards based on relative performance. Success, both evolutionarily and socially, is a game were top rewards are awarded to those who do well relative to others. Despite the ubiquity of rank-based rewards, there is essentially no experimental research on cooperation in such environments. One of the clearest examples of relative performances is employer-mandated forced evaluation systems. These were made famous by their use at General Electric under Jack Welch [29]. In forced or "stack" ranking systems, employees are ranked from best to worst based on relative performance. It is estimated that over 20% of US business organizations utilize some version of forced distribution [30,31] with mixed opinions regarding the impact on organization outcomes [29,32,33].This area of rank or tournament-based payoffs has a long history [34], and remains a current area of research [35,36].
Beyond forced rankings in organizations, relative performance payoffs are ubiquitous. Relative rank plays a central role in grading at most levels of education. Because access to graduate schools and careers is based, in significant part on grades, relative performance affects almost every single person, and has broad societal implications. Sporting accomplishments are also based on relative performance. At the team level, this year's champion is the team that did relatively better than its opponents. Within teams, there is competition for jobs and playing time. Consider the (in)famous baseball player Wally Pipp, who played an important role for the NY Yankeesin the pennant years of 1921, 1923, and the championship 1923 team. On June 2, 1925, Lou Gehrig started instead of Pipp, who reportedly complained of a headache. Gehrig played in every game for the next 14 years while Pipp was traded. Wally Pipp was a talented baseball player who led the league in home runs for two years; he was traded from the Yankees because he was slightly less highly-skilled than Lou Gehrig.
Relative performance payoffs pervade life. Political candidates win elections by obtaining more votes than others; Bill Clinton became president in 1992 with 43% of the votes. Generals are selected from a larger group of colonels, popes from archbishops, managers from employees, etc. Relative performance payoffs are ubiquitous in life, but they are absent in most of the experimental studies of cooperation. This omission seems particularly important given the prominent role within the cooperation literature of punishment technology that inflicts more damage on the punished than the punisher. For example, a common punishment technology reduces the punished subject's payoffs by 3 units for every 1 unit cost to the punisher (3:1 hereafter). Such asymmetric punishment technology leads to the ability to alter relative rank. An altruistic player might utilize powerful punishment on a counterpart in order to induce cooperative behavior. However, a spiteful or self-interested player might also use powerful punishment to move up in relative rank.
Because relative performance is central important to many aspects of life, this experiment investigates its impact on behavior in a repeated public goods game with punishment. The subjects earn payoffs from a repeated public goods game with punishment, and, in addition, the subjects get bonuses or penalties based on the relative performance in the public goods game. This study uses high-powered punishment where the cost to the punished is fifty times as much as the cost to the punisher (50:1 hereafter). In some important non-laboratory settings, including some modern foragers, people have the ability to impose very costly punishments at relatively low cost to the punisher [28]. High-powered punishment technology more easily allows a punisher to move up in relative rank than less powerful punishment technology. Thus, the results found in this experiment may not generalize to situations with less powerful punishment technology. The discussion section provides additional comments on the potential fragility of the results presented here.
In summary, this paper analyzes cooperation and punishment in a repeated public goods game with relative rank payoffs. The questions that are important to the field of cooperation are: Q1: What is the cooperation level with relative performance payoffs?
Q2: What is the punishment level with relative performance payoffs? Q3: Is efficiency increased by punishment with relative performance payoffs?

Methods
A total of 96 undergraduate students from Chapman University voluntarily participated in the experiment. Four experimental sessions with 24 subjects took place. Each of the 24 subjects's played two, 6-period public goods game with a punishment stage. Each subject played one 'traditional' public goods game with punishment, and one public goods game with punishment with the addition of relative performance payoffs. All public goods games in this experiment used high-powered punished (50:1) where the cost to the punished is fifty times to cost to the punisher [28]. Subjects each received $7 in advance for participation, and were paid according to behavior in the game. All game behavior was conducted in Experimental Currency Units (ECUs). At the end of experiment ECUs were converted to cash at a rate of 100 tokens to $1. Subjects were informed of the ECU/$ exchange rate before the experiment began. Subjects were paid in cash and privately at the end of the session.
The subjects could not lose money in either one of the public goods game. With 50:1 punishment technology subjects could receive sufficiently large sanctions to exhaust their endowments. In such cases, the game was stopped for all subjects. Thus, a subject earned the $7 show up fee, the pay earned within the public goods game without relative performance payoffs plus the pay earned within the public goods game with relative performance payoffs (never less than $0 for either of the two public goods games). The rank payoffs were designed to pay high-ranking subjects a bonus, and impose a penalty on low ranking subjects. The total, and average, of the rank-based, extra payments was zero. The specific rank payoffs were determined as follows. At the end of the 6-period repeated public goods game with punishment and relative rank, the 24 subjects in the session were ranked from 1 st to 24 th . 1 st place was given a bonus of 550 ECUs. 2 nd place earned 500 ECU's. Payoffs decreased by 50 ECUs per place except that rank of 12 and rank of 13 each earned 0 ECUs.
The complete relative performance payoff instructions are contained in the appendix. The verbatim wording and a truncated version of the payoff table is: "After this experiment is over, you will be ranked from 1st to 24 th place. There are financial rewards and penalties based on where you rank. Specifically, the table below explains how your earnings from the experiment will be adjusted based on your relative rank" (Table 1). Subjects in half the sessions played the public goods with relative performance payoffs first, while the other half of the subjects played the public goods with relative performance payoffs second. In each period of a public goods game, the 24 subjects were allocated to six groups of four subjects. Groupings were structured in the manner of the 'perfect stranger' treatment of Fehr and Gächter [37]. The allocation of subjects to the groups ensured that, within a given treatment, no subject ever met another individual more than once. In each round, subjects were identified with a transient identifier to ensure no reputations could be formed. At the end of each period, subjects were informed about their own decisions, the decisions of the other group members, and their payoff in current period.
Subjects were given written instructions that explained the structure of the game, the composition of groups in each period, the inability to form reputations because of anonymity. After the instructions, and before the experiment, subjects were given a test of knowledge on several hypothetical examples. In order to participate, each subject was required to get all the payoff examples correct. All experimental decisions were made on a computer screen using z-tree software. The subjects sat in four rows with six individuals per row. All decisions were made via computers, and each subject had his or her own computer. 3-sided opaque screens separated each computer and subject. Subjects were instructed not to look at anyone else's screen and not to speak to each other. In each round, each player was given 20 ECUs to allocate between a public and a private account. ECUs in the private account remained with the player, while those allocated to the group account were multiplied by 1.6 and divided equally among 4 subjects in a group.
The punishment phase came after each round of the public good. Group members were identified by a transient number and their contribution to the public good. Each player could allocate up to 10 units of punishment to each of the 3 other group members. A unit of punishment cost the punisher 1 ECU and reduced the punished player's payoff by 50 ECU. The 50:1 punishment technology allows for large negative payouts. The maximum punishment possible in a round is 30 points (10 from each of the 3 other subjects in the group), which would result in a loss of 1,500 ECU. Subjects with negative balances would no longer suffer the costs of punishment, and might behave differently. Accordingly, two adjustments were made. First, endowments were made large relative to the public goods game payoffs; endowments in three sessions were 2000 ECU and in one session the endowment was 1500 ECU. Second, the game was stopped after a round if any player's cumulative balance turned negative.
Subjects were recruited by email via Chapman University's Economics Science Institute (ESI) email list. Subjects were restricted to not having participated in previous ESI public goods experiments. Each session had equal numbers of men and women for a total of 48 woman and 48 men. All sessions were conducted in May 2012, beginning at the same time (4pm) on a Tuesday, Wednesday, or Thursday. No deception was used in the experiment.

Results
A summary of the results is as follows. Contributions to the public good are close to 100%. Punishment rates are substantial. Because punishment rates are substantial, and because the experiment uses high-powered 50:1 punishment technology, total payoffs are negative. The punishment was sufficiently severe that subjects exhausted their endowments in fewer than six rounds. Punishment decreases aggregate payoffs; Group payoffs would have been much higher in the same setting without punishment.

All experiments ended in under 6 rounds because of punishment costs
Any game was stopped after a round when one or more subjects had earned a negative total by having within game losses exceed the endowment. The negative total means that total costs from the punishment exceeded the combination of the endowment plus the earnings from the public goods stage. Each subject started with an endowment that was large relative to the payoff from the public good portion. However, because of the high-powered punishment technology, and its use by the subjects, all four sessions ended before the maximum of 6 rounds; two public goods games ended after 2 rounds, 1 experiment ended after 4 rounds, and 1 experiment ended after 5 rounds.

Contributions are high
Contributions to the public good are high ( Figure 1 & Table  2). The average amount contributed is 17.38/20 in round 1 and rises to 19.11/20 in round 4 before declining to 17.75/20 in round 5 ( Figure 1 & Table 2).

Punishment is substantial
Punishment is substantial. Figure 2 & Table 3 contain average punishment received (in units, before being multiplied by 50). Each player can receive a maximum of 30 units of punishment (10 units from each of the 3 other subjects in the group). The average punishment per player in a round ranges from 4.53 in round 1 to 6.46 in round 6 ( Figure 2 & Table 3).  Punishment in the public goods game with relative performance payoffs is higher than punishment in the same setting without relative performance payoffs. The game without relative performance payoffs was previously published [28], and is repeated here. Figure 3 & Table 4 show punishment levels for the two treatments by round. Punishment is approximately tripled in the presence of relative performance-based payoffs. The difference in punishment levels is statistically significant (e.g., p<0.01 for round 1 punishment) (Figure 3 & Table 4).

Costs of Punishment overwhelm benefits of cooperation
Earnings are lower in this game with punishment than they could be in the same public good game without punishment. In a public goods game with these parameters without punishment, the lowest payoff occurs when nothing is contributed to the public good. In this worst case, each player earns the endowment of 20 ECUs in each round. In this public goods game with punishment and relative rank payoffs, the average earnings are negative1,051.3 ECU. Contribution rates to the public good are very high, and this increases earnings, but with highpowered punishment, the cost of punishment (51 ECU per unit of punishment) overwhelms the gains from contribution ( Figure  4 & Table 5).

Discussion
This study was motivated by the observation that relative performance payoffs are pervasive outside the laboratory, but relatively rare in experimental studies of cooperation. This omission is particularly important because experimental studies commonly use asymmetric (e.g., 3:1) punishment technology that allows punishers to move up in relative rank. What happens to punishment, cooperation, and group outcomes in the presence of relative performance payoffs? In this study, the ability to punish, in the presence of rank-based payoffs, has an extremely negative impact on aggregate outcomes. While cooperation is high, punishment is sufficiently high as to overwhelm the cooperative benefits. The outcome is a world filled with costly punishment and low payoffs. Punishment that destroys aggregate welfare is not altruistic. The study adds to the literature that argues against a simple notion that adding punishment will improve cooperative outcomes. The results support the idea that there is a complex relationship between the exact environment and the impact of punishment on cooperation and payoffs at the group and individual level.
Interestingly, this more nuanced view is contained in very early research on punishment and cooperation, "Generalization of the current results to real public goods problems involving large groups requires careful assessment of the possible factors which distinguish large groups from small groups" [18]. How robust are the results in this study? The subjects use highpowered 50:1 punishment, and the subjects are specifically rewarded for relative performance. This leads to obvious questions about the robustness of the scorched earth outcome in the experiment. This issue leads to suggestions for several subsequent studies discussed below. Before getting to those studies, however, it is also reasonable question the robustness of the original altruistic punishment paper [17]. That study had the following design features: anonymity, one-shot interactions between any pair of subjects, 3:1 punishment technology, fixed and known 6 -period horizon, and the ability to punish without fear of retaliation. Many or most real world situation involve repeated interaction between people who know each other with the ability to offer variable rewards and punishments, and an ability to respond to any action. All laboratory environments are artificial.
This study suggests that in settings with relative performance payoffs, some people might punish to advance their individual goals. As such punishment may reduce outcomes for the group. To the extent that the existence of relative performance payoffs more closely approximates non-laboratory conditions, several other experiments would be informative. First, this study uses high-powered punishment where the impact on the punished is 50 times the cost to the punisher. Most of the experimental literature uses lower-powered punishment technology. It would be useful to do a comparative statics experiment with relative performance payoffs similar to prior work [27].
Second, the subjects in this study are US college students. These subjects may not be representative of people more generally [24], and therefore it would be useful to run the experiment with different subject pools. Given the use of highpowered punishment technology, it would be particularly interesting to run this experiment with other subjects drawn from populations that exhibit anti-social punishment [22,23]. Third, 'counter punishment' studies allow multiple rounds of punishment [25][26][27]. These investigations could be repeated with relative performance-based payoffs. Fourth, this study uses a 'perfect stranger' type of treatment of the six-period design of Fehr and Gächter [37]. Some papers argue that long-term cooperation, with more than six periods, is more representative of non-laboratory conditions [37,38]. This line of work suggests experiments using relative performance payoffs with more periods. It should be noted, however, that all the sessions of this experiment ended before the sixth period because some subjects ran out of money. Some modifications to the punishment technology and/or the endowment would be required to examine behavior with more periods [39,40].

Conclusion
In this experiment with relative performance-based payoffs, punishment is not altruistic.