How ski jumping gets Olympic judging right (and figure skating gets it wrong)
By Eric Zitzewitz
Feb. 12, 2014 at 11:27 p.m. GMT+9
This is a guest post by Dartmouth economist Eric Zitzewitz. When not writing about such topics as benchmark indices and equity compensation, he is the author of twopapers on bias in figure skating judging.
Some amount of bias is probably inevitable in judged sports. The question is, can a sport do anything to control it? As an otherwise serious academic economist, I got interested in Olympic judging because I was interested in how groups of people make decisions. A group of judges picking a gold medalist is at least a little analogous to a committee deciding whom to hire. Individual members each have their own opinions about the decision, and these opinions reflect both valuable information and the members’ biases. The question is how to combine these opinions, and how to create incentives for members to stay as unbiased as possible.
Both ski jumping and figure skating have nationalistic judging biases, where judges give higher scores to athletes from their countries. But the sports take very different approaches to dealing with this. Ski jumping has its international federation select the judges for competitions like the Olympics, and I find that they select the least biased judges. Figure skating lets its national federations select the judges, and my research showed that they select the most biased judges.
This creates different incentives for judges. Ski jumping judges display less nationalism in lower-level competitions — it appears they keep their nationalism under wraps in less important contests to avoid missing their chance at judging the Olympics. Figure skating judges are actually more biased in the lesser contests; they may actually be more biased than they would like to be due to pressure from their federations.
Despite being nationalistically biased, ski jumping judges appear to care about fairness. Having a compatriot judge is actually not an advantage in ski jumping. I find that while you get a higher score from your compatriot, the other four judges lower your score ever so slightly, leaving you no better (or worse) off.
The reverse happens in figure skating. Not only does a figure skater with a compatriot judge get a higher score from that judge, but they also get higher scores on average from the other judges, too (compared with events when they are not represented on the panel). This is evidence of vote trading, of the kind that occurred at the Olympics in 1998, 2002, and (allegedly) is occurring in 2014. Most of the benefit of having a compatriot judge actually comes through the vote trading. Skaters even benefit from having compatriot judges on the panels of other events, which is consistent with the fact that the vote trading we know about is often across events.
The dysfunctionality of the sport is also revealed by how it reacted to the 2002 judging scandal. The International Skating Union made a couple of sensible reforms, such as increasing the size of the judging panel (at least temporarily) and making the scoring system more objective (although some think they went too far). But most of their response consisted of hiding the evidence of bias. The ISU stopped revealing which judge gave which score, making it much harder for competitors and fans to see whether the judging was fair. The ISU even went back and altered online score sheets from earlier competitions, obfuscating which judge gave which score and even which country each judge represented. They also began randomly dropping scores from three out of 12 judges. As any statistician can tell you, an average of nine out of 12 scores is essentially the average of the 12 scores plus a random number. When Yale statistics professor Jay Emerson noticed that in one case this randomness had altered who won a medal, it appears that the ISU responded by scrambling the order that scores were reported on score sheets. The only plausible purpose of this change was to make it harder to identify cases where randomness had affected results. All this suggests that the focus has been on hiding problems rather than fixing them.
While the scrubbing of past score sheets and randomization are very hard to defend, the ISU argues that judge anonymity should actually help prevent vote trading deals by making it easier for judges to secretly renege on them. The argument is basically the same for the secret ballot in elections or voice voting in legislatures, where the idea is to protect voters or lawmakers from external influences. Anonymity, though, also protects judges from monitoring by outsiders. I find that judging biases got about 20 percent larger after anonymity was introduced, suggesting that in this case, transparency is the better approach.
Of all these results, I am most intrigued by the contrast between the ski jumping judges undoing each other’s biases and the figure skating judges reinforcing them. When we make decisions in a group at work, we often encounter individuals with strong biases — say to hire a particular type of job candidate. When we do, we have a choice. We can act like a ski jumping judge, and resist the bias, in an effort to keep things fair. Or we can act like a figure skating judge and say “hiring this guy really seems important to Joe, I wonder what he’ll give me in return if I go along.” We have probably all seen examples of both in our lives.
What determines whether we get “ski jumping” or “figure skating” behavior out of our colleagues? Incentives are part of the story, and an organizational culture helps create those incentives. Whether a skilled vote trader is viewed as a savvy operator, or as someone who undermines the organization’s mission, is partly a question of values. But culture is also a product of the incentives people face. Whether the “skilled operators” move up in the world affects the extent to which that skill is respected.
As an economist, I know how to fix figure skating judges’ incentives. Fixing culture is harder, but my hope would be that if you can get the incentives right, the culture will eventually follow.
How Figure Skating Judges May Have Shaped The Olympic Podium
An analysis by BuzzFeed News shows that judges displayed a strong preference for skaters from the same country at the 2018 Winter Olympics, and that it may have made a difference in a gold medal finish.
BuzzFeed News Reporter
BuzzFeed News Reporter
Two weeks of figure skating at the 2018 Winter Olympics provided fans with tight spins, close competitions, and judges continuing to show a preference for skaters from their own country.
Before the Olympics, a BuzzFeed News investigation found that judges consistently boost the scores of skaters from their home country. A new analysis shows that in this year’s Olympic Games, possible instances of national preference may have affected the final standings in at least two cases.
Now that the figure skating competition has concluded, we’ve crunched the numbers for all 250 performances in South Korea and found that home-country preference of Olympic judges slightly outpaced our earlier assessment. When we examined 17 high-level competitions between October 2016 and December 2017, there was a bump of 3.4 points. In these games, that figure climbed to 3.9.
The International Skating Union, which monitors the sport, did not respond to calls or emails for comment. In a previous statement, it told BuzzFeed News that “the Olympic Winter Games PyeongChang 2018 is the culmination of the Olympic cycle and years of dedication for the athletes, therefore it is the ISU’s priority to guarantee a fair result during this important event.”
Data alone can’t explain why these patterns emerged. Higher home-country scores do not in and of themselves show a judge is purposely trying to raise a competitor’s standing. Judges might not even be aware that their scoring shows a consistent pattern, and their judging could reflect a preference for a regional style of skating or simply their patriotism.
Still, this pattern may be consequential enough to affect who takes home a gold medal. In the ice dancing competition, home-country preference appears to have boosted the winners — Canada’s Tessa Virtue and Scott Moir — past their rivals, France’s Gabriella Papadakis and Guillaume Cizeron.
The two ice dancing pairs were among the favorites heading into the Olympics — and the Canadians bested the French by just 0.79 points in the final standings.
But the Canadians may also have had an advantage: the judging panel.
Leanna Caron, the president of Skate Canada, was selected to judge both programs in the ice dance competition. (Nine judges are randomly selected from a pool of 13 before each segment of the competition.) Bothtimes she gave Virtue and Moir a score that was higher than the average of the other judges on the panel.
And both times, she scored the French team lower than any other judge.
In an email, a spokesperson for Skate Canada declined to respond about the scores of its judges, but told BuzzFeed News that “all of the Canadian judges submitted to the ISU for the PyeongChang 2018 Olympic Winter Games meet the eligibility criteria, and were approved as judges by the ISU, according to its rules and regulations.”
Christine Hurth, France’s ice dance judge at the Olympics, was only selected to score the short dance, which made up about 40% of each team’s total. Hurth gave the Canadian team the lowest score of any judge for the short dance. She scored the French about average.
The French skating federation did not respond to phone calls and emails requesting comment.
After the competition, BuzzFeed News calculated what would have happened if the Canadian and French judges’ scores were all replaced by those of an “average” judge. In that scenario, Papadakis and Cizeron would have won the gold medal by 0.39 points. (You can read more about our analysis here.)
“We aren’t involved in the picking of the judges, we’re not concerned with what country sits on the panel,” Moir told a news conference, according to Reuters.
“At Skate Canada, we have a history of very professional judging that’s very fair, and we’re proud of that. I feel that, as Canadians, when you win in [the] Olympics, it’s when you deserve it, and we feel like these Olympics medals, that we deserve [them].”
BuzzFeed News also calculated what would have happened if the scores of the French and Canadian judges were simply removed. In that scenario, there would have been only seven judges on the short dance panel and only eight judges on the free dance panel. In this analysis, too, Papadakis and Cizeron would have passed Virtue and Moir for the gold medal, in this case by 0.46 points.
China’s Jin Boyang was considered another medal hopeful entering the men’s competition — and one of the Chinese judges displayed a remarkably high home-country preference as well.
The judge, Weiguang Chen, scored her countryman Jin a total of 10.7 points above the average during the short program. During the free skate, Chen graded Jin a whopping 25.0 points above the average — the biggest boost by any figure skating judge during the entire Olympics.
Figure skating's scoring system reduces the effect of extreme judgments by throwing out the highest and lowest score for each part of a performance. But very high scores can still influence the outcome by preventing the next-highest score for each part from being discarded.
The Chinese skating federation and Chen did not respond to phone calls or emails requesting comment.
Jin ultimately finished fourth, in front of American Nathan Chen by just 0.42 points. The American had his own home-country judge on the panel. Even so, if the scores for both the Chinese and American home-country judges were replaced with those of an “average” one, the American skater would have finished fourth — ahead of Jin.
The ISU is responsible for monitoring judges. But our previous investigation showed that its own system for catching outliers very rarely flags their scores. The system would have flagged both times Chen judged Jin during the men’s competition.
And finally, Lorrie Parker, the American judge, stood out by giving US skaters three of the biggest home-country boosts. She scored Americans Adam Rippon and Nathan Chen 11.6 and 11.5 points above the average, respectively, during the free skate of the men’s competition. Her grade of Rippon during the men’s free skate in the team event was 10.9 points above the average.
The US Figure Skating Association and Parker did not respond to emails or phone calls for comment.
To make scoring less subjective, the ISU put more emphasis on the technical elements of a program. Formerly, judges awarded competitors just two scores on a scale of 0 to 6.0; now, judges evaluate each jump, spin, or step sequence individually on a scale from –3 to +3, which is then adjusted for difficulty. They also grade five different artistic components of the program. A skater’s final score is a complex calculation based on the scale of difficulty of the program and an average of the judging panel’s marks. The highest and lowest scores for each aspect are tossed out to lessen the influence of outliers.
Since the 2002 scandal and the introduction of the new judging system, the sport has been largely cleaned up, said Charles Cyr, the ISU’s sports director for figure skating. “It’s a new breed of judges. Let me tell you they’re not wilted lilies,” he said. “The old guard of judges from 15 or 20 years ago have retired and gone,” he added, taking with them “the old adage of ‘I have to do what my country says.’”
“I heard judge after judge going up to the microphone and say, ‘I want to be accountable for my marks. The good judges have nothing to hide.’”
But with all the fixes that the new scoring system promised, there also came less transparency. In the new scoring system, the ISU made the judges’ individual scores anonymous, on the theory that secrecy would shield them from pressure from their home country federations.
“Anonymous judging was the worst mistake ever made by the ISU,” said Sonia Bianchetti, an ISU hall of famer who used to serve as a judge and technical committee chair. The change made it almost impossible to see if judges were making errors or regularly scoring their own skaters higher, she said. It was a stark change from the 1970s, when she led the successful effort to suspend all Soviet judges for an entire season because they had demonstrated repeated national bias.
When the Russian skater Adelina Sotnikova beat out Korea’s Yuna Kim for the women’s gold medal at the Sochi Olympics in 2014, one of the Russian judges sparked an uproar when she was seen hugging Sotnikova backstage after the win. The Korean skating federation accused the Russian federation of an ethics violation for appointing that judge, who was married to the former head of the Russian skating federation. The ISU dismissed the complaint.
Two years later, at the 2016 ISU Congress in Dubrovnik, Croatia, skating federations voted almost unanimously to end anonymous judging. “I heard judge after judge going up to the microphone and say, ‘I want to be accountable for my marks,’” said John Coughlin, a 2012 US national champion who currently serves on one of the ISU’s technical committees. “The good judges have nothing to hide.”Matthew Stockman / Getty Images
The ISU has its own algorithm for catching scoring errors or potential bias — but some insiders say it is not effective.
“The corridor is so big that it’s almost impossible to get flagged,” said a former ISU technical committee member.
“The judges really know how do it — how to play within that band,” said another former ISU technical committee member.
“It’s quite a huge margin that the ISU allows,” said a former high-level member of a national federation.
To see if these claims were valid, BuzzFeed News recreated the ISU system that highlights scores far above or below the average — or outside the corridor, in skating parlance. Public ISU documents describe the system in detail, and BuzzFeed News translated that description into a computer algorithm. We consulted with Prins, the Dutch judge, and academic literature to verify our interpretation. We also shared a draft of our methodology with the ISU, which declined to confirm or correct it. We found that the ISU’s system would have flagged barely 1% of all scores for technical elements and an even smaller fraction of scores for artistic components.
Unlike our analysis, the ISU’s system only looks at one performance at a time, flagging any scores that fall far enough from the average. So if, during a particular competition, one judge gave a skater from her home country a much higher score than any of the other judges on the panel, the algorithm would detect that anomaly, and members of the technical committee would follow up to determine if the outlying score was the result of error or bias. But the algorithm does not track patterns across time. So it wouldn’t flag a judge who consistently gave skaters from her home country a less noticeable boost across many different performances or even many years.
A review of the ISU’s online disciplinary records shows that there have been no major punitive actions over the last decade explicitly related to judges scoring their home countries’ skaters higher. However, the ISU does not release information when a judge receives more minor sanctions — such as a “letter of criticism” or an “assessment.”
Cyr defended the ISU’s decision to keep these evaluations private, comparing them to a company’s internal disciplinary records of its employees, but other current officials said greater transparency would improve the sport. “If the ISU did make these people public, they’d get better,” said one former high-level ISU official.
After the 2014–15 season, the ISU stopped publishing even the total number of assessments it hands down.
Within the ISU, responsibility for evaluating judges falls mainly to two technical committees, each a panel of three international judges, an athlete, a coach, and a chairperson. These committees wield considerable power: Not only can they recommend sanctioning a judge, but they can also set new rules in the sport and determine the list of qualified judges for each season.
At every international competition, referees help oversee judges. Referees score each skating performance independently from the judges, and they can recommend that a technical committee give additional scrutiny to a judge.
“They’re not policing it,” said a former skating official. “It’s all a bit farcical.”
High-level ISU events, such as the World Championships or the Olympics, include an extra layer of oversight for judges. A pair of observers, known as an Officials Assessment Commission, reviews any outlier scores flagged by the ISU’s algorithm. The technical committee can then recommend a letter of criticism or assessment to the judge. That recommendation must be approved by the sports director. Repeated assessments can lead to a suspension or a demotion.
Some officials said that the system for overseeing judges had enough checks and balances to protect the integrity of competitions. “The judges know that even though the event is done, their marks are going to be scrutinized,” the ISU’s Cyr said.
But others officials refuted that claim. In interviews, officials said they couldn’t recall a judge who the ISU had booted out for good. “They’re not policing it,” said a former skating federation official. “It’s all a bit farcical.”
Our analysis shows that just one judge can influence the final results. One example: At the men's competition at the Progressive Skate America in October 2016, Russian judge Maira Abasova scored her compatriot Sergei Voronov higher than any judge except for one. Abasova’s score helped boost Voronov into fourth place overall, just 0.20 points ahead of Boyang Jin, a Chinese skater, in the final standings. It’s impossible to know why Abasova gave the scores she did, but replacing her marks with the average of the other judges would have dropped Voronov into fifth — behind Jin.
Abasova could not be reached, but BuzzFeed News sent a letter with detailed questions to her through the Figure Skating Federation of Russia. A federation representative declined to comment or make the judge available for an interview but said, “Abasova is aware of the letter and has no comment.”
A former high-level ISU official said that it was common to push one’s own skaters, explaining, “If you don’t do it, if you don’t join in the game, then you get left behind.” ●
John Templon is a data reporter for BuzzFeed News and is based in New York. His secure PGP fingerprint is 2FF6 89D6 9606 812D 5663 C7CE 2DFF BE75 55E5 DF99