Saturday, 26 July 2014

How unfair is the Super 15 schedule?

In advance of the semi-final between the Crusaders and the Sharks this evening, it is timely to look at the fairness of the Super 15 schedule. The Crusaders are playing at home, a massive advantage that they earned by virtue of finishing one point ahead of the Sharks in the regular season. But was that a fair reflection of the two teams? 

The Super 15 rugby competition is a bit unusual in its unbalance. There are five teams from each of three countries. Each team plays the other four teams in its country twice, home and away; it plays four of the five teams from each of the other two countries once, two games at home and two away; and it doesn’t play the remaining two teams at all. This leads to three reasons why a schedule may favour some teams over others: First, teams from stronger countries have to play more games against each other making it harder for the best teams from those countries to finish ahead of the best teams from weaker countries; second, a team is favoured if the two teams it doesn’t have to play are relatively weak strong; and third, for the best teams, there is an advantage to playing the stronger teams from other countries at home to get the benefit of home-field advantage, and play away against weaker teams who can be expected to lose in any location. 

Mark Reason recently had an article in the Dominion Post suggesting that these factors led the Crusaders (who finished the regular season in second place overall) to have been favoured in this year’s competition and to have penalised the Hurricanes (who finished seventh and out of the playoffs) . His logic seemed impeccable to me; certainly it seemed that the Crusaders benefited from the luck of the draw this year relative to recent years when they had to play the best South African teams in South Africa.

I am currently doing some research constructing rankings for international cricket, and thought it would be fun to use the same method to infer how teams would have finished in the Super 15 had they had a balanced schedule. Kirdan Lees has beaten me to it, in a welcome new blog: Sport Loves Data. Kirdan has reevaluated the ranking of the 15 teams, taking into account the imbalance in the schedule, and has posted his results here. Given that Kirdan’s method is very different from mine, I decided to see how the two methods would compare. The table below gives the actual points table, and my revised points table adjusted for schedule unfairness. (The TL;DR explanation of my method is detailed at the bottom of this post.)

Team
Actual
Predicted
Waratahs
58
58.5
Crusaders
51
50.1
Sharks
50
52.4
Brumbies
45
45.0
Chiefs
44
42.6
Highlanders
42
37.7
Hurricanes
41
40.7
Western Force
40
40.2
Bulls
38
38.3
Blues
37
38.1
Stormers
32
33.4
Lions
31
33.8
Reds
28
24.5
Cheetahs
24
23.9
Rebels
21
23.1

Kirdan's method gives rankings rather than points, so the following table shows just the assumed finishing position: 

Team
Actual
Predicted
Kirdan
Waratahs
1
1
1
Crusaders
2
3
2
Sharks
3
2
4
Brumbies
4
4
3
Chiefs
5
5
6
Highlanders
6
10
7
Hurricanes
7
6
5
Western Force
8
7
11=
Bulls
9
8
9
Blues
10
9
10
Stormers
11
12
8
Lions
12
11
11=
Reds
13
13
14
Cheetahs
14
14
13
Rebels
15
15
15

The interesting thing is that my and Mark Reason’s intuition about how much the Crusaders were favoured this year turns out to have been overblown, although the method does result in my having the Crusader’s ranked just behind the Sharks rather than slightly ahead. And yes, the Hurricanes would have qualified for the playoffs as one of the top six teams using my or Kirdan's rankings, but using my method the reason is not that the method pushed them up but rather that the big mover was the Highlanders, who appear to have been hugely favoured by the schedule this year. 


Postscript: Kirdan has another post looking at home field advantage in the Super 15. My probit regression method, would require a lot more data to analyse team-specific home field advantage, but in a model which assumes that the advantage is constant across teams, the result is that home-field matters so much in the Super 15 that, in a match between two teams of equal ability, the one playing at home has a 75% chance of winning. It is no surprise that the Super rugby competition has almost always been won by the team that finished first in the regular season, and who therefore are not only likely the strongest team, but also earn home-field advantage throughout the playoffs. 

TL;DR Explanation of Method: 
  • There are two separate LHS variables, each estimated by an ordered probit regression: table points scored by home team, table points scored by away team. Each can take the values 0, 1, 2, 3, 4, 5. 
  • My database only included the scores, not the bonus points scored. The actual points earned by each team for winning, tying, or losing by 7 points or less, can be inferred from the scores, but not bonus points for scoring 4 tries or more. I proxied this by assigning a bonus point if the team scored 30 points or more. The method proceeds as follows: 
  1. Generate a dummy for each of the 15 teams that equals 1 if that team was the home team, and -1 if it was the away team.
  2. Run two ordered probits, one for points scored by the home team, and one for points scored by the away team, in each case run on the 15 dummies (one dropped) and a constant. 
  3. Predict the probability of scoring 0,1,2,3,4,5 points for each of the 210 possible matchups (each team playing each other home or away), and found the expected points for each.
  4. Then sum these to get the total points in a balanced competition where every team plays every other twice, home and away.
  5. Finally, normalise these by a linear transformation to get the same mean and s.d. as the actual super 15 points table.