The dialogue in the comments to my post on red and yellow cards in rugby is the motivation for this rather geeky post on the application of dynamic programming in sport.
Dynamic programming is a wonderful mathematical tool, used a lot in economics, finance, operations research, and many other applications. In essence, it deals with situations where the value of an action can come partly from a direct payoff to that action and partly from putting oneself in a better position to gain a payoff in the future. In DP analysis, the expected value of being in a particular position (called a "state") is equal to the expected immediate payoff from taking the optimal action that state plus the expected value of being in a new state as a result of the action. The value of any state is then a function of the values in other states, which can be solved for either by working backwards in time from a well-defined end, or by a set of simultaneous equations.
DP analysis is particularly well suited to the analysis of sport, which contains many examples where the objective on any particular play is not to gain a payoff (i.e. score points) but to put oneself in a better position to score points later on.
Among many examples, DP analysis has been used to analyse whether one should kick away possession on 4th down in American footoball; to model the bevahiour of tennis players in choosing the optimal amount of agressiveness on first serves compared to second serves; and by my doctoral student, Scott Brooker, to model the trade-off between fast scoring and wicket preservation in ODI cricket. The analysis of the first of these examples was famously used by football coach and economics graduate, Bill Belichick, helping him to win three superbowls with the New England Patriots. (HT: Mankiw.)
The principles of dynamic programming are naturally understood intuitively by sportsmen, albeit with consistent sub-optimality in their applications in some contexts. It is implicit in the old rugby adage to never pass to a player in a worse position than oneself. And I believe that the famous (or notorious) long-ball game that Graeme Taylor brought to the Watford soccer team in the 1970s was motivated by the obeservation that almost all goals in English club games were socred within a very small number of passes of the team gaining posession. In DP terms, this translates as the value of having possession at your end of the field if you play a short-passing game is less than the value of the opposition having posession at its end, so you might as well just kick for territory not posession.
So what is the relevance of dynamic programming to my suggested rugby rule change?
I suggested a rule change that would increase the value of having posession inside the opposition 22, since the payoff to taking a shot at goal from a penalty would increase. The comments were then that the rule change would have no effect on infringing further out from the goal line, since there is no feasible option to take a shot at goal. This is not correct, however. Let's say you are awarded a penalty on the halfway line, and don't have a kicker you would trust to kick for goal from there. The best option with and without the rule change would be to kick for touch from which there is a roughly 90% chance you will regain possession at the line-out. But that doesn't mean that a penalty has the same benefit as before. Let's say a kick for touch cna advance your team 25 metres, so a penalty at halfway would earn you almost certain posession around the opposition 22. If the cost of infringing is raised inside your own 22, then the valuye of holding posession is increased, and hence so is the value of a penatly on halfway. Similarly, the value of a penalty on your own 22 is increased (albeit with diminishing returns) as it enables a team to get almost-certain posession at the half-way line, etc.
Maybe they should be teaching dynamic programming at our sports academies?