2020-08-06

■

Nympheaさんのブログ「惑星ハニューにようこそ」で翻訳のうえ紹介された

書店を経営するライター、マルティーナ・フランマルティーノさんのブログSportlandiaに掲載されている
ジャッジに関する非常に興味深い分析記事3本です。リンク元でご覧ください。

　　原本はこちら⇒　Sportlandia　https://sportlandiamartina.wordpress.com/

多くのデータをもとに、ナショナルバイアスや特定選手への過大評価・過小評価について鋭く分析されています。

「ローリー・パーカー」

http://pianetahanyu.altervista.org/sportlandia%e3%82%88%e3%82%8a%e3%80%8c%e3%83%ad%e3%83%bc%e3%83%aa%e3%83%bc%e3%83%bb%e3%83%91%e3%83%bc%e3%82%ab%e3%83%bc%e3%80%8d/

「コレオシークエンス」

http://pianetahanyu.altervista.org/sportlandia%e3%82%88%e3%82%8a%e3%80%8c%e3%82%b3%e3%83%ac%e3%82%aa%e3%82%b7%e3%83%bc%e3%82%af%e3%82%a8%e3%83%b3%e3%82%b9%e3%80%8d/

「サーシャ・マルティネス」

http://pianetahanyu.altervista.org/sportlandia%e3%82%88%e3%82%8a%e3%80%8c%e3%82%b5%e3%83%bc%e3%82%b7%e3%83%a3%e3%83%bb%e3%83%9e%e3%83%ab%e3%83%86%e3%82%a3%e3%83%8d%e3%82%b9%e3%80%8d/

2020-08-06

2019トリノGPF　羽生結弦VSネイサン・チェン～「惑星ハニューにようこそ」より

トリノGPFを検証するための資料として、惑星ハニューにようこそさんより

http://pianetahanyu.altervista.org/%E3%83%9D%E3%83%83%E3%83%89%E3%82%AD%E3%83%A3%E3%82%B9%E3%83%88kiss%EF%BC%86cry-ii%E7%AC%AC7%E5%9B%9E%E3%80%8C%E3%83%88%E3%83%AA%E3%83%8Egpf%EF%BD%9E%E7%BE%BD%E7%94%9F%E7%B5%90%E5%BC%A6vs%E3%83%8D/

ポッドキャストKiss＆Cry II第7回「トリノGPF～羽生結弦vsネイサン・チェン」

12月 23, 2019

第7回（最長バージョン154分）から
トリノGPFについての話題、男子50%、女子30％、ペア＋アイスダンス＋ジュニア20％ぐらいの割合です。
とりあえずシニア男子シングルに関する部分を訳しますが、途中で視聴者からの的外れな質問が入って脱線したりしていますので、重要な部分だけを抜粋・要約します。

出演者
フランチェスコ・パオーネ（司会）（F）
マッシミリアーノ・アンベージ（ジャーナリスト、冬季競技アナリスト）（M）
アンジェロ・ドルフィーニ（元ナショナルチャンピオン、国際テクニカルスペシャリスト）（A）

F：ジュニアの話題で採点の話が出たけれど、この3～４日間というもの男子シングルの試合の採点について議論が過熱している。
ネイサン・チェンが優勝したのは仕方がないとして、得点については多くの人があり得ないと言っている。
これに関して君達の意見は？

M：まず前提として、ネイサン・チェンの優勝は妥当で、これについては議論の余地はなかったと言っておかなければならない。
ネイサン・チェンは2つのプログラムを大きなミスなく揃え、羽生はミスをしたからチェンが優勝したのは当然だ。

事実を言えば、羽生がショートのコンビネーションでミスをした時点で勝負の行方は決まっていた。
この瞬間から羽生は大差で追う立場になり、ほとんど不可能に近い逆転を試みるという強い意志によって、頭の中で様々な考えを巡らせ、限界に近いプログラムに挑まざるを得なくなった。
チェンのフリーはネガティブといよりポジティブな印象だった。

問題は各エレメントを注意深く分析すると、彼に与えられたGOEにはルールが定める＋５、＋４、＋３の基準が反映されていないことだ。

僕はこの採点システムは当初考案された通りに機能していないと思う。
ただそれはジャッジが無能、または彼らに悪意があるからではないと思うし、そうでなければ、僕は水曜日の夜、ここでフィギュアスケートの話をせずに家で家族と犬の世話をした方がいい。

僕は年間1万本のプログラムを研究・分析している。
というのも、イタリアとその他の国の10～18歳のスケーターの両親達が僕のところに意見を聞かせて欲しいとプログラムの動画を送ってくるから、プログラムの分析・研究し、評価を与えることには慣れている。

しかし限られた時間内でリアルタイムで、演技構成点も同時に評価しながら、全てのプラス/マイナス要件を考慮して12個のエレメントのGOEを採点することは客観的に見て不可能だ。
この採点システムを使って適切に採点することは人間の能力の域を超えている。

このジャンプは凄く綺麗だったから＋５を与えようといるジャッジがいるかもしれない。
しかしこれはルールで求められている要件ではない。
ジャッジは僅かな時間内にコンポーネンツも採点しなければならない。
しかも最初にプラスを付けたジャンプがテクニカルパネルのレビューで回転不足やエッジエラー等と判定された場合、それぞれのケースに応じてルールで決められているマイナスを適用しなければならなくなる。

つまりこの採点システムを試合中の限られた時間で正しく適用することは不可能なのだ。
勿論、だからといってネイサン・チェンの優勝が妥当ではなかったと言いたいわけではない。
彼は優勝に相応しかった。
しかしながら、ネイサン・チェンの幾つかのジャンプの評価は間違っていた。
イタリアジャッジはネイサン・チェンの3アクセルに＋５を与えた理由を僕に説明して欲しい。
例えば、ネイサン・チェンの4サルコウに羽生結弦の4サルコウと同じGOEを与えたジャッジは彼らのジャンプをもう一度見直すべきだ。

羽生はフリーのトランジションを少しシンプルにしたけれど、それでもジャンプの入りの難しさは明らかにチェンを上回っている。

僕はジャッジが無能だと言いたいわけではない。
しかし、このような限られた時間でGOEの要件を全て考慮しながら評価するのは不可能なのだ。
だからジャッジ達は何となく＋５．＋４．＋３と与えているが、これはルールに従った採点方法ではない。

羽生のショートの3Aに+4を与えたジャッジは、6項目あるプラス要件の＋5獲得に必要な3項目（４，５，６）の内、2つを満たしていなかったと判断したということになる。

一体何を満たしていなかったのか？

音楽に合っていなかった？
冗談だろう？
羽生は病的なほど音ハメにこだわる選手で、音に合っていないと気持ち悪くて跳べないほどだ。
だから第1にジャンプは音楽に合っていた。

独創的で意表を突く難しい入り方じゃなかった？
もし羽生のショートの3Aの入りが独創的じゃないなら、僕達はスタジオを畳んで立ち去ろう。

空中姿勢が良くなかった？
少し軸が曲がっていたことが1回ぐらいあったかもしれないけれど、GOEでマイナスになるほどではない。

だから＋4を付けたジャッジは上述の3項目のどれが足りなかったのが説明出来なければならない。

僕はジャッジがこの項目のどれかが足りないと判断したとは思わない。
あの瞬間、これらの項目を考慮しながら採点しなければならないという考えに及ばなかったのだと思う。
今シーズン、このようなケースが数多く見られている。

A：確かにそうだ。
軸がやや曲がっているという理由で＋5ではなく＋4を与えるジャッジがいるかもしれない
だが、実際にはプラス要件は6項目あって、1項目（空中姿勢）を満たしていなくても＋５だ。
しかし、短い時間でこういったことを思い出しながら採点するのは非常に難しい。

M：僕はジャッジではないけれど、実際に非常に難しいことだということを肌で感じている。
だからもう少しルールをシンプル化するなどの対策を講じなければならない。
なぜならこの採点システムはもはや機能していないし、選手に与えられるGOEにはルールが反映されていない。

ネイサン・チェンはアスレチック面において並外れたパフォーマンスを滑り切った。
しかし彼が全てのジャンプを降りたからといって、全てのエレメンツにGOE+5を与えていいということにはならない。

彼のフリーはポジティブ以外の何物でもなかった。
しかしながら、選手が実施するエレメンツのクオリティが適切に評価されていないと、試合を公正な目で見ることは難しくなる。

羽生は別の理由で敗れた。
何度も言っているように、全てはショートプログラムで入らなかったコンビネーションから始まった。

しかし、ショートプログラムを細かく分析すると、羽生が実施した非の打ちどころのない完璧なジャンプ、4サルコウと3アクセルで＋5満点を獲得していないことが分かる。
＋3や＋4を与えたジャッジはこのジャンプに何が足りなかったのか説明出来なければならない。

こういったことは羽生結弦だけでなく、ネイサン・チェンや他の多くのスケーターの採点にも見られる傾向だ。

僕はこの採点システムを考案された当初の意図通りに試合中に適切に適用することは不可能だと言いたい。
短時間で12個のエレメンツをルールの基準を守って正確に評価することは不可能だ。
+3/-3の頃の方がまだ良かった。評価がより簡単だったからだ。
それでも、非常に良い＋３、良い＋２、まあまあ＋１、疑問がある０という具合に採点されていることは感じられていた。

A：でも7段階だから11段階に比べて差異は少なかった。
この採点システムは非常に巧妙に構築されているけれど、ジャッジに与えられた時間内で採点するには複雑過ぎるし、客観的にみて困難だという君の意見に僕も同感だ。
そして僕の見方ではもう一つの概念がある。

キャリアが非常に長く、6点満点の旧採点時代から採点しているジャッジ達は、まず選手同士を比べて順位を判断するという相対評価の考え方が残っているのかもしれない。
現在の採点システムは絶対評価で、ライバルに関係なく氷上で披露されたエレメンツを既定の基準に則って評価しなければならない。
しかし、未だに6点満点の旧採点システム脳が残っているジャッジ達は、相対評価で採点する傾向があるのかもしれない。

例えばシングルになってしまった羽生のアクセルに比べて、ネイサン・チェンの3アクセルは綺麗に成功したから＋５や＋４を与えてしまう。気持ちは分からなくないが、冷静に分析すると難しい入り方から実施されていないし、ルールが求める＋４、＋５の要件を満たしてなかった。

しかし、短い時間でこのように細かく分析するのは難しいし、実際にファイナルにおけるネイサン・チェンの演技は圧巻で、着氷に流れのないジャンプは幾つかあったけれど、ミスは最小限で、しかも技術的に非常に難しいプログラムだったから、全体的に高めの得点を与えたのかもしれない。しかし、後で冷静になってこのプログラムだけをルールの基準と照らし合わせて分析すると、幾つかのエレメンツの評価は適切でないことが分かる。

しかし、その場で個々のエレメンツを適切に評価するのは難しい。
だからこそ、僕達はこうやって個々のエレメンツについて分析し、議論するのが好きだ。

僕達の見方では羽生は並外れたエレメンツを披露し、彼がファイナルで持ち帰ったエレメントはポジティブな印象を与えた。
何よりも4ルッツ。
4ループも素晴らしかった。

GOEの存在は大きいし、難しい入り方は6項目あるGOEプラス要件の一つだ。
しかし、難しい入り方によって、ジャンプを跳ぶのがより難しくなり、他の項目を満たせなくリスクがある。しかし、難しい入りはPCSのトランジションの評価にも反映されなければならない。

しかし、戦略という面において新ルールのGOEの各要件を注意深く分析すると、こんな見方も出来る。
現在のルールでは高スピードから実施された幅と高さがあり、空中姿勢も良いジャンプは＋４を獲得することが出来る。更に音ハメして跳べば＋5を獲得出来るかもしれない。

M：男子シングルの試合を分析するのは非常に難しい。
羽生は何故、このような特徴を持つプログラムを考案するのか？
彼のプログラムは常軌を逸した難度の尋常ではないプログラムだ。
GOEやPCSで思うような評価が得られない場合、スケーターが考えるのは、もし技術的に可能なら場合、基礎点を引き上げることだ。

ファイナルにおける羽生とネイサン・チェンを分析すると、二人がノーミスだった場合、羽生の方が基礎点の高いプログラムを滑ったことが分かる。
満点を想定したGOE合計ではチェンの方が小数点高い。

スピンの基礎点（羽生のフライングシットスピン3点に対してチェンの足替えキャメルスピン3.2点）が0.20点羽生より高いから、GOEで0.10
4ループに比べて4フリップの方が0.50基礎点が高いから0.25上回る。

これらを考慮して計算すると、GOEが満点と仮定した羽生のプログラムはチェンのGOE満点のプログラムを0.80点上回る。

羽生は大差を挽回しなければならなかったから、このようなプログラムに挑戦した。
今大会中の羽生の4回転ジャンプの確率は試合でも練習でも宇宙的に高かった。
試合でミスをした4T-3Tはランスルーで失敗していたけれど、その後すぐに跳び直して成功させていた。
4ルッツは数本しか跳んでいなかったけれど、その数本はほぼ全部成功していた。
実際、フリーの日には6分間練習で完璧な4ルッツを決め（史上最も美しい4ルッツの一つだった）、本番でもほぼ同レベルの4ルッツを成功させた。

つまり、彼はここ一番でハードルを引き上げることが出来る能力を見せつけた。
今回はアスレチック面が少し足りなかった。
プログラム開始から2分半後には明らかに疲れているのが分かった。
しかも5本目のクワドを入れるためにいつもよりトランジションを削り、クロスオーバーと両足滑走を増やさざるを得なくなった。

しかし、彼の最終目標はクワド4種5本のフリーを完成させることだから、トロントの練習では似たようなプログラムを練習していたのかもしれない。
だから試合で5本のクワドを成功させたことは大きな収穫だったと思う。

勿論、彼はこれからクオリティを磨き、予定された全てのエレメンツが試合で入るよう練習を積まなければならない。
今この瞬間、ネイサン・チェンのように安定感のある選手に勝つには「完璧」が求められる。
これは事実だ。

しかし、僕が言いたいことは、もし二人が共にノーミスだった場合、ネイサン・チェンが羽生結弦を上回るということはあり得ないということだ。
もしそうでなければルールが守られていないということになる、

P：しかしマッシミリアーノ、今この瞬間も視聴者から同じ質問が続々と寄せられている。
全員、君の分析を評価する一方で、最近のGOE爆盛は結果を左右し始めていることを危惧している。
そして多くの人がこう指摘している。
OK、採点システムが複雑過ぎて機能していないこと、これによって誤審があることは分かった。
しかし、何故いつも一方通行に誤審されるのか？
つまり常に有利に間違われる選手と、常に不利に間違われる選手がいるのは何故なのか？
この事実の裏には、ここまでに君達が議論した内容以外の別の理由があると思わない？

M：皆さん、僕は何と言えばいいのか？
グランプリシリーズが始まった時から、僕は指摘していた。
シニアではなくジュニアのグランプリシリーズだ。

アリサ・リウという名のスケーターに彼女が披露するエレメンツの実際のクオリティとはかけ離れたバカ高いGOEが与えられるのを僕達は見た。
客観的にこのような現象は初めて見た。

彼女は彼と同じ国籍、同じルーツを持つ選手だ。
2022年のオリンピックがどこで開催されるのか知らない人はいないだろう。
だけど、僕は政治の話はしたくない。
何故なら、僕は政治に興味はないし、ネイサン・チェンは何も盗んでいないからだ。
僕はネイサン・チェンが何かを盗んでいるとは言いたくない。

しかしネイサン・チェンと羽生結弦がエレメントとエレメントの間にやっていることを分析すると、ネイサン・チェンと結弦では全く違うことが分かる。
羽生結弦は勝つためにあまりにもリスクが高く、体力の消耗が激しいフィギュアスケートをやっている。

確かにネイサン・チェンの方がスタミナがあるからフィジカル面では上回っていると言う人がいるかもしれないが、ネイサン・チェンのプログラムは羽生結弦のプログラムの半分しか体力を消耗しない。

A：羽生もフリーではやむを得ず繋ぎを少し減らしてよりクワドを増やす方向に向かっているけれど
ショートでは僕の考えは知っているよね。
2人共ノーミスならネイサン・チェンが羽生結弦を上回ることは考えられない。
今回、ネイサン・チェンが110点に達したから少し考えさせられたけれど
ネイサンの方が基礎点は高い

M：羽生の4サルコウの基礎点9.7に対してチェンの4ルッツは11.50だ。
アドバンテージは基礎点だけでなくGOEにもある。
チェンの4ルッツのGOE満点を5.75なのに対し、羽生結弦が4サルコウで獲得出来るGOE満点は4.85だ。

しかし、2人が完璧なショートを滑った場合、羽生が上回る。
理由は3アクセルだ。
羽生の3アクセルは＋5だ。
この点に議論の余地はない。
そして、演技構成点の話をすると、彼らの間には巨大な差がある。

僕は別の試合を見たかった。
僕はこういう試合展開が見たかった。
羽生がショートをノーミスで滑って首位に立つ

A：その通り

M：今回、ネイサン・チェンのショートプログラムが非常に寛大に評価されたことを考慮すると、点差は1点ぐらいだったかもしれないけれど、いずれにしても羽生が首位だっただろう。
そして羽生が彼のクラシックなフリープログラムを完璧に滑り、GOEとPCSでほぼ満点を獲得して勝つ。
これが僕の夢見ていたファイナルだった。
そうなれば、あらゆる意味において宇宙的なファイナルになったはずだった。

しかし羽生はショートでミスをした。
ミスも試合の一部だから仕方がない。
結果、試合はあのような結果に終わった。

しかし、結果をまとめるとこうだ。
チェンが出した世界最高得点334点は、僕の見方では氷上で披露された内容には値しない得点だったと思う。
この総合得点から色々な得点を引いていかなければならない。
いずれにしても試合に勝ったのはネイサン・チェンだった。

＜ショートでコーチ不在だった件について＞

M：グランプリファイナルでは各スケーターにつきコーチ1人しか同行できない。
クリケットクラブには今回のファイナルに進出した選手は他にいなかったから、トリノに行けるコーチは一人だった。
日本スケート連盟はオーサーではなくジスラン・ブリアンを指名した。
ブリアンは結弦がトロントで最も多くの時間を一緒に過ごしている、おそらく彼に一番近い人物だ。
結弦は非常に高難度のフリープログラムに挑戦し、もしかしたら更に構成を上げるかもしれないから、ジャンプのコーチであるブリアンの方が適任だと判断したとオーサーは説明している。

もしブラウンがファイナルに進出していたらオーサーとブリアンの二人がトリノに来ることが出来たから色々な状況をよりスムーズに管理するが出来ただろう。
選手にとってコーチの存在はロジスティックスやその他の細かい点において必要不可欠だ。
しかしトラブルが発生してブリアンはフランクフルトで足止めされ、トリノ到着が遅れることになった。

従って、羽生はショートプログラムとその前の公式練習に一人で立ち向かわなければならなくなった。
そしてショートプログラムでミスがあった。
問題はショート翌日の公式練習も一人だったことだ（笑）
その公式練習の終盤15分、彼はひたすら4アクセルだけを練習し始めた。
回転し切っていたジャンプもあったけれど、激しく転倒していた。
これは（コーチ不在の）普通ではない影響だ。
怪我をするリスクもあった。
もしブリアンかブライアン・オーサーがいたらこのようなことは起こらなかった

A：なかったね（笑）

M：いずれにしてもフリープログラムの戦略は変わらなかっただろう。
10点以上の点差があったから、持てるカードを全て出して、あらゆるリスクに挑戦する必要があった。
でもコーチがいたら、このような困難な状況をより適切に管理することが出来ただろう。

このような重要な大会で、よく知らない国でコーチ無しで競技することは簡単ではなかったはずだ。

A：僕達はみんなキス&クライに一人で座る彼を見て驚いた。
たださえ特定の状況において情動をコントロールするのは簡単なことではない。
しかも羽生の気性は僕達全員が知っている通りだ。
もしコーチがいたら、僕達がパラヴェーラで目撃したようなことは起こらなかっただろう（笑）。

試合がこのような展開になったことは残念だった。
しかし、時間を巻き戻すことは出来ない。
ようやくブリアンが到着し、状況を修正した。
間違いなくプランBも想定していたのだろう。
しかしながら、このような大会ではプレッシャーや練習やエネルギーを管理するためのサポートが必要だ。
今回はそれが無かった。
本当に残念だったけれど、仕方がない。

M：心配している人もいるけれど羽生は気丈に受け止めていると思う。
エキシビションの練習中に他の出演者達とじゃれ合ったり、ロシアの選手やコーチ達とふざけていた彼を見れば分かる。
これが羽生結弦なのだ。

でも注意して欲しい。
これから別のゲームが始まる。
きっとアンジェロには気に食わないゲームだと思うけれど、新たなゲームに向けて羽生はショートの構成も変えてくるだろう。
もし彼の4ルッツが安定したら、状況は一転する。
もし4ループが、彼が練習で何度も決めていたように、簡単に決められるジャンプになったら、状況は二転する。

翻訳すると、彼が頭の中に何があるのか僕達には想像出来ない。
このネイサン・チェンとの二連敗は彼を苦しめているはずだ。
彼の第二のホームであるカナダでの世界選手権で雪辱を晴らすために、彼はあらゆることをするだろう。
今シーズンのグランプリシリーズではようやくカナダのジンクスを打ち破ることが出来た。
だから今後どう変化していくか楽しみにしていよう。
そして、ファイナルで考案されたフリープログラムの構成が今後、安定していくか、つまり後半も高いクオリティで滑れるようになるか見守ろう。

プログラムの構成について考察すると、ネイサン・チェンのプログラム構成は天才的だと僕は思う。
ネイサン・チェンは得点を稼ぐマシンだ。

確かに羽生もネイサン・チェンもクワド5本だが、ネイサン・チェンはより高得点を稼げるジャンプを前半に固めている。
フリーの最初の3つのジャンプ要素の基礎点と実際にネイサン・チェンが持ち帰った得点を見て欲しい。
彼は心身共にフレッシュな前半冒頭に高得点のジャンプ要素を実施している。
高難度ジャンプは体力だけでなく精神力も消耗するが、彼は前半に跳ぶことでこの非常に高難度のジャンプを全て成功させている。
実際、ネイサン・チェンは最初の3本のジャンプ要素で50点以上稼いでいる。
だから僕はネイサンの構成が非常に巧妙だと思うのだ。

一方、結弦はどうか？
彼は誰よりも基礎点に細心の注意を払う選手だ。
結弦は冒頭に4ループを入れ、後半にコンビネーションジャンプを固めている。

プログラム開始から3分後に4T/eu/3F
ネイサン・チェンも同じコンビネーションを跳んでいるけれど、プログラム開始45秒で実施している。
そして前人未到のジャンプシークエンス、3A/3A。恐るべき難度のエレメントだ。
彼は3Aを4回続けて跳ぶことも出来るけれど、プログラムの最後に跳ぶとなると別問題だ。

そして4T/3T
彼が時々ミスをする要素だ。
彼はジュニアの頃からセカンドジャンプが3Tのコンビネーションで時々ミスをすることがあった。
勿論、彼にとっては並外れたクオリティで跳ぶことの出来るジャンプだけれど、プログラム開始から2分後に跳ぶとなるとハードだ。
実際、トリノでは4T-2Tになってしまった。

だから2人のプログラムを比較すると、ネイサン・チェンの方がより楽に得点が稼げる構成になっているのが分かる。

羽生の3本目のジャンプは3ルッツ、そして最後の2分間に高難度コンビネーションが集中している。
一方、ネイサン・チェンの方がプログラム全体にジャンプがバランスよく配置されているから、ジャンプとジャンプの間に休むことが出来る。
勿論、結弦の方がリスクを冒している分、全てが完璧に決まったら当然、チェンを上回る。

ネイサン・チェンは何故3アクセル2本にしないのか？と尋ねる者がいるかもしれない。
答えは簡単だ。3アクセル2本は割に合わないからだ。
3アクセル2本にすると3トゥループを2本跳べなくなる。
ネイサン・チェンは前半に4F/3Tのコンビネーションを跳び、4トゥループを2本跳ぶ。

最後のジャンプ要素にはオプションが２つある
3A/2T
これはアクセルがあまり得意ではない彼にとって非常にリスクのあるオプションだ
基礎点は9.30＋後半ボーナス10％

一方3Lz/3Tだと5.9＋4.2＋後半ボーナス10％で11点以上稼ぐことが出来る。
しかも、3ルッツはネイサン・チェンにとって簡単なジャンプだ。
だから僕はネイサン・チェンのプログラムが構成という点において天才的だと言っているのだ。
難度/得点という点において非常に効率よく得点を稼げる。

演技構成点に話に戻るけれど、ショートプログラムで羽生とネイサン・チェンのPCSが同じというのは考えられない。
特に自由裁量の余地がほとんどないスケーティングスキルとトランジションについては勝負にもならない。
確かに羽生は80％の出来だったかもしれないけれど、それでも彼がこの点においてより優れていることは議論の余地がない。

より主観が入る他の項目に関しては、ネイサン・チェンの演技の方が好みという人がいるかもしれない。
それでも僕はその他の項目においても羽生が上回っていると思う。
勿論、大きなミスがあった場合、幾つかの項目でチェンが上回ることがあるかもしれない。

A：フリーで羽生は幾つかのトランジションを削っていた。
ネイサン・チェンの強みは4ルッツ、4フリップを高い確率で跳べることだ。
一方、羽生は難しいトランジションをてんこ盛りに入れているけれど、プログラムをあまりにも難しくすると、ジャンプをミスするリスクが高くなるから諸刃の剣と言える。
今回、クワドを5本入れるためにやむを得ず、トランジションを削ったが、これが正しい戦略かどうか僕には分からない。
高難度ジャンプと難しいトランジションはハイリスク/ハイリターンなのでちょうどいいバランスを見つけるのは難しい。

M：確かに羽生はトランジションを削ったけれど、それでもネイサン・チェンよりは豊かなプログラムだ。

＜4アクセルについて＞

F：視聴者からの最後の質問。
結弦の4アクセルの印象は？

A：ハハハ

M：まずこのような挑戦を目撃したことは驚異的なことだった

A：狼狽させられた（笑）

M：まさかこのような代物を見られるとは思ってもいなかったから
このジャンプがどれほど難しいエレメントなのかが分かった。

現地で実際に見た、そして映像でも見返したこのジャンプは僕の想像を遥かに超えていた。
僕が受けた印象は、回転は出来ているけれど、まだ着氷に至っていないトリプルジャンプを練習している少女達から受ける印象に似ている。

ジャンプを完成させるには、まだ幾つかの点において何かが足りないと思う。
このジャンプを完成するまでに後ぐらいの時間を要するのか僕には分からない。

A：4アクセルの挑戦を見られたことは、狼狽させられたけれど、間違いなくスペクタクルだった。
永遠に続くジャンプ
本当に永遠に終わらないジャンプ、空中に永遠に留まっているジャンプという印象を受けた。
驚異的なスピードから跳んでいる。
危険なジャンプというイメージも僕に与えた（笑）
完成までそれほど遠くないと思う。
僕はこれほど完成に近い4アクセルは初めて見た

M：でも今日明日に完成するジャンプじゃないよね

A：それはない。まだ何かが足りない。
でも完成までそれほど遠くないと思う。

M：同じようにトリプルジャンプに挑戦する少女達もいずれ着氷出来るようになるからね。
でも問題は、転倒の仕方が悪いと怪我をするリスクの高いジャンプだということだ。
実際、トリノではヒヤッとする転倒があった。

歴史と羽生のために、彼がいつかこのジャンプを試合で成功できることを願っているけれど、僕は彼がトリノで試した構成を完璧に滑って、クオリティでネイサン・チェンを上回るという2人の対決を見たい。それで十分だ。
4アクセルはもっと先でもいいと思う。

でも羽生には全てが可能だ。
彼がある意味で全能（神）だから。
でももし彼が4アクセルを成功させたら・・・涙だね
それどころか号泣だ

A：そうだね。

2020-08-06

■

ジャッジへの疑問についてデータによる分析

中日新聞でのフィギュアスケートの採点に関する分析記事です。

冒頭の「ジャッジの精度が大幅に向上した」には？？？？ですが、

まじめな分析であること、ISUの評価委員会のことなど普段表に出てこないことを

取り上げてくれているので記録の意味で紹介します。

https://www.chunichi.co.jp/ee/feature/figureskating2020/judging_the_judges.html

審判を審判する2020

消えないナショナルバイアス

フィギュアスケートの11段階採点法が導入されて２季目となる2019/20年のシーズンが終わった。主要大会の採点データを分析すると、ジャッジの精度が大幅に向上した反面、自国選手に対する甘い評価（ナショナルバイアス）が依然として続いていることが明らかになった。

2020/4/15 中日新聞電子編集部

国際スケート連盟（ISU）の評価委員会は、演技審判の能力をチェックするため、ジャンプやスピンなどの個別要素の出来栄え点（GOE）と全体を評価する演技構成点（PCS）について、「同じ演技を見た全審判の平均からどれだけ離れているか」を計算し、この数値を逸脱点と呼んでいる。例えば、９人の審判が１、０、−１、０、０、１、０、−１、０（＝平均０）と評価した場合、１を出した審判の逸脱点は１ということになる。

2019/20年シーズンの基準では、①GOE逸脱点が2.5以上、②選手ごとの平均GOE逸脱点が1.5を超える、③PCS逸脱点が1.5以上、④選手ごとの平均PCS逸脱点が1.5を超える、のいずれか１項目を誤審とみなすことになっている。自国選手に対する場合は原則２回とカウントする。採点した選手８人あたり１回を超えると警告対象になる。

11段階評価で精度は向上

出来栄え点の評価スケールは、2018年のルール改定で、-3から3までの７段階評価から-5から5までの11段評価に移行した。満点が３から５になったので、１点の大きさは3/5（＝0.6）になったことになる。スケールが細分化した結果、逸脱点の分布は広がったが、実質的な評価の散らばり（標準偏差）は、17/18年の0.53から19/20年の0.44（=0.73×0.6）に縮小した＝下記グラフ参照。

また、誤審とみなされる異常逸脱点の頻度は、0.77%から0.32%に半減した。

ISUは、GOE評価スケールの細分化と同時に、加点・減点の基準を明確化し、審判のトレーニングにも努めてきた。しかし、スケールを変更していないPCSに目立った改善は見られない。GOEの精度は、11段階評価の導入それ自体によって、格段に向上したと言えるだろう。

消えないバイアス

自国選手と他国選手に対する逸脱点の平均の差をナショナルバイアス（自国選手びいき）とみなすことにしよう。シニア・ジュニア男女個人のすべての審判について計算したナショナルバイアスは、小さいながらもほぼ一貫してプラスに偏っている。

ナショナルバイアスは、判定の精度とは関係がない。例えば、マレーシアのIsmail SURAYU DAKSHINI氏は、逸脱点の散らばりが今季最も大きかった審判だが、自国びいきはほとんどない。逆に、最も精度の高いジャッジをしたスロバキアのAndrea SIMANCIKOVA氏には、平均的なナショナルバイアスがある。

最大のナショナルバイアスを記録したのは、ギリシャのAnna CHATZIATHANASSIOU氏で、ただ一人1.0を超えている。最小はフィンランドのPia ALHONEN氏で、自国選手に非常に厳しかった。いずれも審判を務めたのはジュニアの１大会のみであり、選手の実力のばらつきが影響した可能性がある。

審判を選択:

11段階評価では、ナショナルバイアスも0.6倍して比較する必要がある。全審判の分布を見る限り、GOEバイアスは縮小傾向にあると言えるだろう。

演技構成点のナショナルバイアスも、依然としてプラスに偏っている。

審判を選択:

ナショナルバイアスは、国別に計算すると、より一層明らかになる。

次の図は、選手数の多い９カ国について、国間バイアスを計算したものだ。横軸に選手の国、縦軸に審判の国を同じ順で配置しているため、対角線上の値がナショナルバイアスになっている。最近４年間、すべての国でプラス（赤系の色）だった。欧州３国は、年によっては露骨な自国びいきを示している。

バイアスは、ライバル国に対しても生じている。例えば、アメリカとロシア、カナダとフランスは相互に一貫してマイナスのバイアスがあった。アジア３国の間にはそういった傾向は見られなかった。

異常なジャッジの頻度

競技会場で選手や観客が感じる不公正は、逸脱点が1.5以上になる過度に高い評価を自国選手に与えたり、-1.5以下になる低評価を他国選手に与える場合だ。

以下の表は、統計的指標の一つである独立性検定のp値という観点で、2018/19年と19/20年に異常なジャッジをした審判のリストだ。いずれも、高い頻度で自国選手に甘い評価を与えている。ただし、この場合もジュニア選手の場合がほとんどで、選手の力のばらつきが影響した可能性がある。他国選手に過度の低評価を与えた審判はいなかった。

自国に過度に甘い。他国に過度に厳しい例はない

異常な逸脱点が多いジャッジ
年	名前	国	自国選手			他国選手
年	名前	国	逸脱+	逸脱-	要素数	逸脱+	逸脱-	要素数
18	Ms. Zsuzsanna VIKARNE-HOMOLYA	HUN	6	0	33	31	19	1113
18	Ms. Irina KOMARNICKA	LAT	3	0	16	8	25	569
18	Ms. Nadezhda PARETSKAIA	KAZ	5	0	71	27	32	1804
18	Mr. Samuel AUXIER	USA	6	0	89	9	34	709
18	Mr. Yury BALKOV	UKR	8	0	17	47	9	715
18	Ms. Anna KANTOR	ISR	4	0	34	3	18	375
18	Ms. Stanislava SMIDOVA	CZE	6	2	85	17	15	1070
18	Mr. Philippe MERIGUET	FRA	7	0	56	14	5	641
18	Ms. Halla Bjorg SIGURTHORSDOTTIR	ISL	3	0	7	6	25	201
18	Ms. Florence VUYLSTEKER	FRA	3	0	18	3	3	195
19	Ms. Zsuzsanna VIKARNE-HOMOLYA	HUN	4	0	34	12	15	996
19	Ms. Miriam PALANGE	ITA	2	0	34	3	19	757
19	Ms. Salome CHIGOGIDZE	GEO	6	0	71	7	39	1041
19	Mr. Roger GLENN	USA	6	4	73	7	24	580
19	Ms. Anna CHATZIATHANASSIOU	GRE	2	0	8	3	3	319
19	Ms. Jia YAO	CHN	2	0	16	1	13	197

数字は順に逸脱点が+1.5以上の回/-1.5以下の回/要素回

今季の焦点は利益相反

ISUは、今季のルール改定で、審判団を司るレフリーと、要素のレベル認定を行う技術審判（テクニカルコントローラーら）に関する倫理規定を厳格化した。

ISUの主要大会では、出場選手の親族や名付け親、友人などの「特別な関係」がある場合、出場選手とライバル関係にある選手のコーチである場合、主要大会参加選手のコーチである場合に、レフリー・技術審判を務めることが禁止された。また、各国のスケート連盟の幹部がレフリー・技術審判を務めることも禁止された。シニア大会では、可能な限り、上位が見込まれる国からの任命を避けることになった。

五輪ではさらに厳しくなり、直前の世界選手権で５位までに入賞した国のレフリー・技術審判は忌避されることになった。他国の国内大会でレフリー・技術審判を務めた場合も同様だ。なお、審判も、他国国内大会で審判、レフリーなどを務めた場合、各国連盟の現役会長である場合は忌避されることになった。

技術審判のレベル認定は、演技審判が出来栄え点を判断する前提となるため、結果を大きく左右する。しかし、同時に９人が採点する演技審判と異なり、データの中にバイアスを見出すことは難しい。

以下の表は、ジャンプの回転不足やエッジ判定などの減点判定の頻度について、技術審判（テクニカルコントローラー）の所属国と選手の所属国の関係をまとめ、偏りを探ったものだ。

独立性検定のp値が小さい15件

コントローラーと減点ジャンプ
シーズン	種目	国	自国選手		他国選手		p値
シーズン	種目	国	減点回	要素数	減点回	要素数	p値
2016/17	個人	FRA	2	63	207	1534	0.047
2019/20	個人	USA	6	73	139	752	0.080
2017/18	個人	RUS	1	77	13	156	0.084
2018/19	個人	KOR	5	19	16	186	0.091
2019/20	ペア	ITA	4	36	9	266	0.116
2017/18	個人	POL	0	18	110	577	0.129
2019/20	個人	ISR	0	18	94	514	0.140
2016/17	個人	USA	8	60	9	160	0.144
2018/19	個人	HUN	0	18	173	1045	0.167
2018/19	ダンス	USA	0	162	22	1353	0.203
2019/20	個人	RUS	5	55	132	740	0.211
2019/20	個人	HUN	0	18	153	1076	0.216
2017/18	ダンス	RUS	2	23	4	201	0.260
2018/19	個人	SWE	0	18	58	504	0.299
2017/18	個人	SUI	0	19	53	537	0.340

薄い赤は自国の方が減点判断が少なく、薄い青は多い

自国選手の減点がやや少ない印象があるが、多くは有力選手を輩出している国であることを考慮すると、統計的な偏りはほとんどなかったと言えるだろう。

公平なジャッジとは

ケーキを二人で分けるとき、公平なやり方は一つではない。天秤を使って同じ重さになるように分ける方法は一見科学的だが、デコレーションは切り刻まれてしまう。切った人と別の人が先に選ぶ方法は賢い知恵だが、前提として二人が独立した存在でなければならない。採点基準の細分化は前者、倫理基準の厳格化は後者に当たる。

ルールは、現実に振り回されながら見直される。前半に偏りがちだったジャンプが競技時間の中で均等に配分されるように、後半の点数を1.1倍にするルールが導入されると、当初の意図に反して、ジャンプを後半だけに集中する選手が現れた。くじ引きで審判を割り振る手続きは十分な公平性を担保すると思われていたが、よりによって五輪で有力国の連盟会長が審判に選ばれることが起きてしまった。競技が重視する価値とは何か、公平とは何かは常に問われる。

選手とルールの相克はどのスポーツにもあるが、フィギュアスケートでは審判制度も加わってことさらに複雑だ。審判制度の微調整はこれからも続いていくだろう。

【データについて】分析に用いたデータは、2016年から４年間の国際スケート連盟主催大会（シニア、ジュニア、五輪を含む）の男女個人、ペア、アイスダンスについて採点資料から集計した。逸脱点の集計ではジャンプの転倒などで審判全員が最低点を出した場合を除いた。GOEは720,634件、PCSは444,520件。

2020-04-22

ミーシンはルール変更後の審査で恣意性を警告

ジャッジへの疑問について

https://www.goldenskate.com/2018/06/mishin-warns-of-arbitrariness-in-judging-after-rule-changes/

〔自動翻訳〕原文は後ろにあります。

ミーシンはルール変更後の審査で恣意性を警告

公開日：2018年6月1日

ロシアのコーチであるアレクセイミーシンは一般的に改革に反対しておらず、改革に取り組んでいるISUのオフィスホルダーの努力を尊重していますが、慎重なアプローチを推奨しています。

ロシアの伝説的なコーチ、アレクセイミーシン氏は、次の国際スケートユニオンの会議で提案されたルールの変更が、裁判における恣意性につながり、スポーツの進歩を妨げる可能性があると警告しています。

「私の記憶では、コーチとしての時間に2つの大きな変化がありました。強制的な数字の廃止と新しい（ISU判定）システムの導入です」と、エフゲニープルシェンコを4つのオリンピックメダルに指導したミシン（77）は述べました。 2006年は個人の金、2014年はチームの金です。「これらはフィギュアスケートのスポーツ全体を変えました。ある意味でそれは良くなりましたが、ある意味では悪化しました。」

「「才能のある人と才能のない人の見方」が変わった」と彼は続けた。「誰が良いコーチで、悪いコーチが変わった。彼らは数字ではなくジャンプを判断しなければならなかったので、誰が良い裁判官と悪い裁判官であるかが変わりました。新しいシステムは、フィギュアスケートを改善するために作成されたのではなく、審査を容易にするために作成されました。」フィギュアスケートの生体力学に関する本を何冊か執筆したコーチは、スピンやフットワークなどの要素の創造性が失われたと感じています。

現在ミシンは、「革命的性格」の3番目の大きな変化が近づいていることを認識しており、特にマイナス5からプラス5の範囲の実行グレード（GOE）の導入については懐疑的です。

「これらの変更は、判断の客観性に大きな打撃を与えると思います」と彼は警告した。「プラス、3またはプラス4またはプラス5 –違いを生むことは不可能です。基準はなく、技術委員会のメンバーだけが紙にそれらを書き留めますが、基本的にジャンプはすべての人によって同じ方法で行われるため、それほど区別することはできません。」1994年のオリンピックチャンピオンのアレクセイウルマノフや2015年の世界チャンピオンのエリザヴェタツクタミシェバなど、他の多くのスケーターの称号やメダルも指導したミーシン氏は、次のように付け加えた。「友だちにプラス5、敵にプラス1またはマイナス1を与えるそして、このプラス5が決定的です。」

ミーシンにとって、GOEのマイナス3からプラス3の既存の範囲で十分です。現在、一部のトップアスリートは実際にはそれほど良くなかった要素でポジティブGOEを取得するため、常に適切な方法で使用されるとは限りません。「いくつかの女性は悪いダブルアクセルを持っています、それにもかかわらず彼らはプラス2を得ます」と彼は指摘しました。「したがって、このプラス5は大きな打撃となるでしょう。リーダーはプラス5を取得し、非リーダーはゼロまたはマイナス1を取得します。地獄への道は善意で舗装されています。」

1956年にスケートを始めたコーチは、ジャンプの価値の変化も批判し、ローテーションの4分の1のジャンプはローテートと見なされると批判しました。これまでは、4分の1以上のローテーションでした。ミシン氏は、カメラの角度やジャンプがどのように記録されたかによって違いが生じる可能性があると指摘しています。たとえば、実際には回転が不足しているジャンプは回転しているように見えたり、その逆の場合があります。

サンクトペテルブルクのレスガフト体育スポーツ大学の教授でもあるコーチは、ミスに対するペナルティがそれだけ大きい場合、スケーターは新たなジャンプを学ぶのをやめられると感じています。「競争でそれを行うにはモチベーションが必要です」と、数十年にわたってフィギュアスケートのテクニックを研究し、世界中で使用されているジャンプを教えるためのエクササイズと方法を開発してきたミーシンは言いました。「選手が競技でジャンプをするつもりがないことを知っているとき、彼はゆっくりそれを学びます。彼が競争でジャンプを実行することを知っているなら、彼はより早くそれを習得するでしょう。」

ミーシン氏は、トリプルルッツとクワッドルッツには大きな違いがあると指摘しています。「1,000人がトリプルルッツを行っていますが、4ルッツを行うのはわずか12人です」と彼は観察した。「それは100倍困難です。回転する必要がありますが、現在のシステムは回転ジャンプと回転不足ジャンプを完全に処理できます。」

彼はまた、各タイプの四重ジャンプはプログラム内で一度だけ実行でき、現在のように組み合わせて繰り返すことはできないという提案にも同意しません。

提案されたルール変更の一部は、主に、まだマスターしていないクワッドを追いかける男性への反応であり、「スプラッシュフェスト」につながる可能性があります。たとえば、3月の2018 ISUワールドフィギュアスケート選手権では、一部のスケーターがフリースケートで4〜5回落ちました。しかし、Mishinは、クワッドの数を制限したり、スケーターが挑戦しないようにしたりする必要はないと感じています。

「既存のルールはそれをうまく処理することができました」と彼は意見を述べた。「誰がWorldsの適切な場所にいなかったのですか？次のシーズン、これらの（間違い）は、私たちが進行を止めないので、再び起こることはありません。量は質に変わります。次のシーズンにはこの岩の落下はありません。これは人生の発展ですから。それは単なるフィギュアスケートだけではありません。すべてが良くなります。車、コンピューター、アスリートも良くなります。」

一言で言えば、Mishinは、スケーターがあまりにも多くのクワッドを試行するのを防ぐためにルールを変更する必要はないと感じています。彼はスケーターは進歩し、次のシーズンには彼らのジャンプとより一貫性があると信じています。ミスが減り、転倒が減ります。

有名なコーチは一般的に改革に反対するものではなく、改革に取り組んでいるISUのオフィスホルダーの努力を尊重しますが、慎重なアプローチを推奨します。「代数との調和を測定することは困難な作業です」とミシンは観察した。「判断の完璧さを追求するのは当然ですが、考え抜かれた非常に重要な段階的なアプローチが必要です。ISUは保守的であると批判されることがよくありますが、この保守主義には良い面があると思います。フィギュアスケートコミュニティは、アスリート、コーチ、裁判官、メディア、観客など、何千人もの人々で構成されています。このような重いボートは、激しく揺れ動くことはできません。水がボードを入れ替えることがあります。」

Mishin warns of arbitrariness in judging after rule changes

By Tatjana Flade
Photo © Tatjana Flade
Published: June 1, 2018

Russian coach Alexei Mishin is not against reforms in general and respects the efforts of the ISU office holders that are working on them, but he recommends a careful approach.

Legendary Russian coach Alexei Mishin warns that rule changes proposed at the upcoming International Skating Union’s congress could lead to arbitrariness in judging and hinder the progress of the sport.

“In my memory, there were two major changes in my time as a coach – the abolition of the compulsory figures and the introduction of the new (ISU judging) system,” said Mishin, 77, who coached Evgeni Plushenko to four Olympic medals including an individual gold in 2006 and a team gold in 2014. “These changed the whole sport of figure skating. In some way it became better, but in some ways, it became worse.”

“The ‘look at who is talented and who is not talented’ changed,” he continued. “Who is a good coach and a bad coach changed. Who is a good judge and a bad judge changed, because they had to judge jumps and not figures. The new system wasn’t created to make figure skating better, but to make judging easier.” The coach, who authored several books on the biomechanics of figure skating, feels that creativity in elements, like spins and footwork, was lost.

Now Mishin sees the third major change “of revolutionary character” coming up and is skeptical, notably about the introduction of a grade of execution (GOE) ranging from minus five to plus five.

“These changes deliver a heavy blow to the objectivity of judging, I think,” he warned. “Plus, three or plus four or plus five – it is not possible to make a difference. There are no criteria, only the members of the technical committee write them down on paper, but you cannot differentiate so much, because basically the jumps are done the same way by everyone.” Mishin, who also coached many other skaters to titles and medals including 1994 Olympic Champion Alexei Urmanov and 2015 World Champion Elizaveta Tuktamysheva, added: “Your friends give a plus five and your enemies give a plus one or a minus one. And this plus five is decisive.”

To Mishin, the existing range of minus three to plus three in the GOE is enough – and currently, not always used in the proper way as some top athletes get positive GOEs on elements that actually were not as good. “Some ladies have a bad double Axel, and they nevertheless get a plus two,” he pointed out. “Therefore, this plus five will be a heavy blow. The leaders will get a plus five and the non-leaders will get zero or minus one. The way to hell is paved with good intentions.”

The coach, who began skating in 1956, also criticized the change of value for jumps and that a jump that is one quarter short of rotation will be considered underrotated. Until now, it was more than a quarter of rotation. Mishin points out that the angle of the camera or how the jump was recorded can make a difference. For example, a jump that is in fact underrotated might seem rotated and vice versa.

The coach, who is also a professor at the Lesgaft University of Physiculture, Sports and Health in St. Petersburg, feels that skaters are discouraged to learn new jumps if the penalty for mistakes is that big. “You need the motivation to do it in competition,” said Mishin, who has been studying figure skating technique for decades and developed exercises and methods for teaching jumps that are used worldwide. “When the athlete knows that he is not going to do the jump in competition, he will learn it slowly. If he knows that he will execute the jump in competition, he’ll master it earlier.”

Mishin points out that there is a big difference between a triple Lutz and a quad Lutz. “A thousand people are doing a triple Lutz, but only a dozen do a quad Lutz,” he observed. “It is a hundred times harder. You have to rotate, but the current system is able to deal with rotated and underrotated jumps perfectly.”

He also disagrees with the proposal that each type of quadruple jump can be done only once in the program and not repeated in combination as it is now.

Some of the proposed rule changes were in large part, a reaction to the men going after quads they haven’t mastered yet and which can lead to “splatfests.” For example, at the 2018 ISU World Figure Skating Championships in March, some skaters fell four or five times in the free skate. However, Mishin feels that it is not necessary to limit the number of quads or discourage skaters from attempting them.

“The existing rules were able to deal with that very well,” he opined. “Who was not in the right place at Worlds? Next season these (mistakes) won’t happen again, because we won’t stop the progress. The quantity will turn into quality. There won’t be this rock fall in the next season. It would be better, because this is the development of life. That’s not just figure skating – everything gets better – the cars, the computers and the athletes get better, too.”

In a nutshell, Mishin feels there is no need to change the rules to prevent skaters from attempting too many quads. He believes that the skaters will make progress and next season they will be more consistent with their jumps. There will be less mistakes and less falls.

The renowned coach is not against reforms in general and respects the efforts of the ISU office holders that are working on them, but he recommends a careful approach. “Trying to measure harmony with algebra is a difficult task,” Mishin observed. “To pursue the perfection of judging is natural, but it requires a thought-out and very important step-by-step approach. I would like to point out one thought – the ISU is often criticized for being conservative, however, I think that this conservatism has a positive side. The figure skating community consists of thousands of people – the athletes as well as coaches, the judges, media, and the spectators. Such a heavy boat you cannot rock too hard – the water can swap over the boards.”

2020-04-07

オリンピックレベルのジャッジが見る採点の実際（2010.8月）

ジャッジへの疑問について採点への疑問

2010.8月投稿の記事

https://figureskate.wordpress.com/2010/03/08/patrick-ibens-interview/

【自動翻訳】原文は下にあります。

フィギュアスケートの世界におけるトニー・ウィーラーの考え

　　　　パトリック・イベンスインタビュー

ありがとうございました！！！

「裁判官の10％は完全に正直だと思います」
オリンピックレベルのフィギュアスケート裁判官、パトリックイベンスとのチャット

Tony Wheeler：こんにちは、パトリック、私の質問に答えてくれてありがとう。あなた自身のフィギュアスケートの背景と、スポーツに携わってどのくらい経ちますか？

Patrick Ibens：フィギュアスケートの私の40年目です。私は5歳のときにスケートを始めましたが、背中の怪我のために18歳までにスケートをしました。エリッククロール、トムデュジャーディン、カールデュジャーディン、ヘンドリックサッセン、ダニーディレン、パトリックヴァンリース、そして私自身、ベルギーには他に数人の男性スケーターしかいませんでした。エリッククロールはヨーロッパ人で24位のようなものを管理しました。これは、私たちの誰もが当時達成した最大の成功でした！

TW：あなたが長年にわたって判断してきた国際的なコンテストのいくつかは何ですか？

PI：ヨーロッパ、4大陸、世界選手権のほか、ジュニアグランプリやシニアグランプリのイベントもいくつか審査しました。また、2006年のトリノと2010年のバンクーバー冬季オリンピックの両方で男子イベントを審査しました。残念なことに、バンクーバーでのショートプログラムの審査は、私にのみ割り当てられました。[審査パネルは、コンテストの部分間で切り替えられます。ショートプログラムの9人の審査員のうち5人がランダムに選ばれ、ロングプログラムも審査されました。イベンスは選ばれませんでした。]

TW：4つの分野すべてを判断しますか？

PI：シングルとペアのみを判断します。私にはアイスダンスはありません。私は本当の裁判官です！

TW：それはどういう意味ですか？

PI：それはフィギュアスケートのジョークです。アイスダンスでは、誰もが判断する前にすべての結果を知っていますが、午前5時から夜遅くまで、すべての練習セッションに参加しています。

TW：では、アイスダンスの結果は誰が決めるのですか？

PI：前回の大会と出身国は大きな役割を果たしています。そして、時々、スケートの品質は重要だと思います。しかし、それは私の部門ではありません。

TW：フランスの裁判官が連盟からの圧力を受けて特定の方法で投票したことを認めた2002年のソルトレイクシティスキャンダル以降、ペアの競争が正直になったと思いますか。

PI：以前とまったく同じだと思う！どのようなシステムを使用していても、人々は常にごまかす方法を見つけます。そのため、バンクーバーでのメンズショートプログラムにこのような素晴らしいパネルを用意できてよかったです。論争はありません！

TW：ジャッジの何パーセントが完全に正直だと思いますか？

PI：正直なところ？私は10％と言います。

TW：本当に、そんなに低い？何故ですか？

PI：はい、しかしさまざまな理由で。裁判官は今でも匿名で得点しているにもかかわらず、まだ連盟を恐れており、自分のスケーター（全国バイアス）を自分のスケーターに近い順位で擁護し、お気に入りのスケーターを押し、裁判官は廊下の外にいるのを恐れて、スケーターをその国でレフリーとして行動するための招待状を受け取る国、または単に単に彼らが何をしているのかわからない！

TW：他の裁判官や連盟から個人的に特定の方法で判断したり、スケーターを上下に動かしたりしたことがありますか？

PI：一度だけですが、高レベルの彗星ではありませんでした。実際、それはある選手権がヨーロッパ人に送られるために勝つ必要がある全国選手権で起こりました。それでも私は屈しませんでした。その結果、私はそこで再び裁判官に招待されることはありませんでした！

TW：退任する裁判官として、自由に意見を述べることができますか？

PI：審査中は自由に話しました。イベントの発生中に判断している特定のイベントに関して何かを話すことはルールに反しますが、イベントレビューミーティングが完了すると、聴きたい人すべてにあなたが見たものすべてにコメントすることができます。私たちは自由な世界に住んでいます！

TW：あなたが判断してきた何年もの間、あなたの好きなスケーターは誰ですか？

PI：ああ、これは難しい！ミシェル・クワン、マイケル・ワイス、ジェフリー・バトル、アレクセイ・ヤグディン、ジェイミー・セール/デビッド・ペルティエ。

そして近年では、アリオナサフチェンコ/ロビンソルコウィ、パトリックチャン、フローレントアモディオ、デニステン、ハビエルフェルナンデス、ヤニックポンセロ、ジョニーロシェット。

TW：特定のスケーターを競争で審査している場合、それらを判断するのは難しいと思いますか、それとも彼らのスケートに対してもっと批判的ですか？

PI：正直なところ、私はそれらを判断するのに問題は一度もありませんでした。唯一の理由は、私が彼らを個人的に知っていて、彼らが私を知っていることです。彼らは彼らがどんな質問でも私に近づくことができて、私が彼らに正しい答えを与えて、私ができる限り彼らを助けることを知っています。しかし、私がスケートの裁判官になる瞬間、すべての個人的な関係は消え、彼らは私が判断しなければならないスケーターになり、彼らは彼らが値するマークを受け取ります。良いまたは悪い！

それがそれが機能する唯一の方法であり、彼らは裁判官と人としてあなたをさらに尊敬します。

TW：あなたが今までに審査委員会で行った中で最も優れたパフォーマンスは何ですか？

PI：1999スケートアメリカでのセールとペルティエによる「ラブストーリー」。審査中に涙を流した！

TW：では、審査自体についていくつか質問させてください。2003年の秋に選択された国際大会で始まった新しいシステムについて、裁判官はどのようなトレーニングを受けましたか？

PI：ISUセミナーを実施しましたが、最新のルールと古いルールの変更点、および審査中に何を探すべきかについて説明します。これには、明白で微妙なエラー、ダウングレード、移行、シングルスの統制ショートプログラムでのソロジャンプへのフットワークなど、多くのことが含まれます。また、ショートプログラムとロングプログラムの違い、およびコンポーネントを判断する際の注意事項についても説明します。ただし、規則の変更について読むのは、各国の連盟と裁判官自身の責任です。それらは国際スケート連盟のウェブサイトにコミュニケーションの形で公開されており、誰でも見ることができます。すべての競技会で、審査員は最初の会議を開き、一般的なルールについて検討します。コンテストのすべてのセグメントの1時間前

TW：5つのプログラムコンポーネントの独自の定義を、シングルススケートとペアスケートにまとめることができますか？実際の定義を暗記している場合は、自由に使用してください。ただし、ごまかすのではなく、自分の言葉で説明してください。[注：これらの回答はすぐに提供され、元の引用から編集されていません。Ibensに答えを考える時間を与えるよりも、すぐに定義を取得するほうが面白いと思いました。]

PI：スケートスキル：
1.ステップとターンのディープエッジで流れと楽な滑走
2.さまざまな速度と加速
3.多方向スケート

移行：
移行の動きがあり、さまざまな動きがあること。常に同じ動きではありません。良い例は、Stephane Lambielの長いプログラムです。彼はトランジションと同じ上半身の動きを常にしています。

パフォーマンス/実行：
1.スケーターは、「私は今、私はこれから…」の感覚をあなたに与えます。
2.パーソナリティ（5分後にスケーターのパフォーマンスを思い出せない場合…彼/彼女はパーソナリティを持っていません）。
3.投影
a）は、彼/彼女が裁判官のスタンド/観客席に飛び込んだような感覚を与えます。
b）彼/彼女と一緒に彼/彼女自身の小さな世界に連れて行きます。
4.各ムーブメントの品質。それぞれの動きは、動きを途中で切るのではなく、最後まで行う必要があります。

振付：
1. 振付が美しく、プログラム全体がうまくレイアウトされた素敵なプログラム。
2.音楽を上手に利用する。

解釈：
1.ほとんどの音符がスケーターによって使用されている場合。
2.音楽が上がると動きも上に上がり、音楽が下がると…動きは下向きになります。
3.スケーターがキャラクターになったとき。
4.スケーターがショープログラムではなく音楽を解釈していること。

TW：あなた自身の定義に基づいて、そしてあなたがメンズコンテストの審査パネルに参加していたので、5つのそれぞれで最も強いと思うのはどの男性ですか？

PI：スケートスキル：高橋、トランジションと振付：チャン、パフォーマンス：リサチェク、解釈：アボット。

TW：Lysacek vs. Plushenkoに関して、各スケーターの長所と短所を個人的にどのように見ていますか？

PI：プルシェンコは自分に非常に自信があり、自分がやっていることを本当に信じています。彼の弱点は、彼がまだ6.0システムから来ており、すべての要素を非常に重要であると計算しないという事実ですが、それが現在行われている方法です。

Lysacekはハードワーカーの戦闘機です。私が思ったときのことを覚えています…「ああ、神様、この少年は決してそれを作ることはありません。彼は本当にそれを持っていません！」したがって、彼の絶対的な最善を尽くすという彼の献身は、本当に彼に有利に働いた。彼の弱点は？考えさせて。多分彼はトリプルアクセル離陸で彼の浮気。ときどき彼の離陸時のスキッド（またはプレローテーション）は半回転以上回転して、いわばトリプルサルコウになります。しかし、それは私が見つけることができる唯一のマイナーな問題であり、常に発生するわけではありません。

TW：エフゲニー自身も含めて、4つのジャンプ（トーループ）をコンペティションの両方の部分で成功させることができたために、プルシェンコが適切に報われなかったという不満の声がたくさんあります。あなたは彼があなたの最後の返答で「全体像」を見ていないと感じていると述べたので、これについてどう思いますか、そしてこのシステムについてのあなたの一般的な考えは何ですか？

PI：彼は4回のつま先のループを試みて着地させることで報酬を得ましたが、その後、他のジャンプでの悪い着地のポイントも失いました。彼の意見では、4ジャンプだけが1位と2位の違いを生むはずだったとしたら、スウェーデンも不満を感じ始めたのは、スケーターのエイドリアンシュルタイスが夕方に最高の4重トーループを持っていて、トップ5にランクインしましょう！[シュルタイスはフリースケートで13位だった]

一方、ほとんどの言語では、フィギュアスケートに「ART」という単語が含まれています。これは、氷の周りをジャンプする以上のものがある必要があることを意味します。

カタリーナウィットがカルメンで勝利したときのことを覚えていますか？彼女はつま先ループとサルコウという2つの異なるトリプルしか表示しませんでしたが、他のジャンプはより困難でした。しかしカタリナは氷上のアートでした！

TW：6.0システムよりもこのシステムの方が好きですか？説明してください。

PI：WelI、ここには複雑な感情があります。良い面も悪い面もあるからです。

良い面のいくつかは、スケーターが最終的にフットワークをしなければならなかったことです。クリーンエッジなどに戻ります。また、前述したように、難しいジャンプだけでなく、すべての要素に対して報酬が与えられます。欠点は、すべての要素、特にスピンとステップが似ていることです。また、多くの創造性のために残された時間もあまりありません。

このシステムについて私が最も嫌うのは、「あまり良くない」審査員を救うために作られていることです。一方、意図された方法でマークしている本当に良い審査員（すべてのコンポーネントが個別に）が出てしまう可能性のリスクがあります。平均点の回廊の評価、およびいくつかの評価を受けるリスク。基本的に何も知らない裁判官は、すべての間違った点を与えるか、完全に推測することができ、それらの点は平均に落ちます！しかし、コンポーネント間のマージンを広くしたい人は、そうするために選ばれるかもしれません。たとえば、世界選手権で最初の3つのグループを採点するときは、5.50から7.00の間を与え、安全な廊下にいます。最後のグループが氷上に来たら、7.00から8.50の間で与えれば、あなたは再び安全です！

また、ある意味では、スポーツ自体からスポーツという言葉を取り除きました。ブライアンズの戦いを覚えていますか？さて、ボイタノは彼のプログラムの終わりに彼に金を着陸させた2番目のトリプルアクセルを追加しました。今日のスケーターが何か追加する場合、追加のポイントは得られません。完了できるジャンプとスピンの最大量があるため、値のない要素になります。

TW：エフゲニー・プルシェンコに戻る。プルシェンコが明らかに彼と仲間のライバル（ブライアン・ジュベール）はジャンプに集中しすぎていたため「トランジション」を持っていなかったとコメントしたとき、大きな論争が始まった。イベント後、この特定の記者会見のためにヨーロッパ人に参加しましたか？

PI：いいえ、ヨーロッパ人ではありませんでしたが、聞いたことがあります。

TW：彼のコメントをどう思いますか？

PI：バカだったと思うけど、プルシェンコは金髪だよね？ただの冗談！私は彼が説明しようとしていたことを見ることができますが、そうすることであまりにも悪いので、彼はブライアン・ジュベールも降ろしました。あなたはそれをしません！私はジュベールの最大のファンではありませんが、アスリートができる最も低いことは、いわば、仲間の競争相手に否定的な光を当てることです。

TW：ジョセフ・インマンからフランスのメディアから発行された元のメールを受信しましたか。もしそうであれば、バンクーバーでの男子コンテストの審査方法に影響があったと思いますか？それは個人的にあなたの判断に影響を与えましたか？

PI：はい、私は電子メールを受け取りましたが、私を知っている人なら誰でも、私がそのようなことに影響を受けないことを知っています。このインマン電子メールは、送信された種類の2番目でした。最初に言ったように、私は本当の裁判官であり、私のメールアドレスやコメントを気にする必要はありません！私はそれを自分のために完全に行うことができます。私はそれが誰であるかに関係なく私が見るものを判断します！

TW：あなたはショートプログラムのパネルにいました。プルシェンコとリサチェクはその特定の部分でどのように滑走したと思いましたか？また、上位3つがコンペティションのその部分でポイント未満で区切られていたので、高橋大輔をどのように獲得しましたか？あなたが判断した方法とはまったく異なる、あなたが思った誰かが上下に押さえつけられましたか？

PI：ショートプログラムの終了直後、上位3名のスケーターが1ポイントも離れていないことがわかりました。パネルが素晴らしい仕事をしたことを私は知っていました！そのセグメントの間、これら3つは同じように優れていて、すべて異なる理由でした。しかし、私の意見では、高橋はショートプログラムに勝ったと思いましたが、審査員としてレベルとダウングレードに関してテクニカルパネルが何を決定したかわからないので、今は何でも可能です。以前のマークもわからないので、他のスケーターに誤って高いマークを付けた可能性がありますが、優れた審査員としてそれを回避する方法があります。

TW：最後の部分について詳しく説明してください。

PI：たとえば、スケーターAに7.25を与えた場合、スケーターBは10人のスケーターになり、私の意見では彼はより優れていますが、7.00を与えたので、そのコンポーネントの間違ったスケーターに誤って「1位」を与えました。

しかし、良い判断者として、最初のスケーターのすべてのコンポーネントを合計します。これにより、平均が7.00になります。次に、そのことを思い出し、次のスケーターが来たときにも同じことを行います。彼がすべてのコンポーネントでより優れている場合、あなたの平均は明らかに7.00より高いはずです！

TW：しかし、このシステムは、10点満点でスケーターをマークするように作られています。スケーターをお互いにぶつけないでください。一度に起こっていることが多すぎるからだと思いますか？

PI：はい、いいえ。私が言ったように、私たちのスポーツは「スポーツ」ではありません。スポーツとは、1つが2つよりも3つよりも優れていることです。あなたは比較することによってのみその結果を思いつくことができます。スピードスケートと呼べば、時計はありますが、スキルと芸術性の点はありません。高速でなければなりません。競技では10のスケールでのみ採点することは不可能です。

TW：あなたや他の多くの裁判官は、スケーターのランスルーを含め、プラクティスに目を通し、コンポーネントマークに対する彼らの能力を理解していますか？

PI：個人的には、自分がしなければならない仕事に集中できるように、気分を正すために1つの練習に行きます。私はスケーターがしていることの正確な詳細を見ることはありません。とにかく、彼らは競争のストレス下にあるとき、異なることをします！しかし、すべての練習を見守る裁判官もいます。理由を聞かないでください。細かいところまで見たことが無いので、練習中にも感じられません。

リアルタイムでは、非常に多くの異なることに集中しなければならないことが時々困難です。要素、ミス、ルール、そして5つのコンポーネントよりも。これは時々難しいことですが、優れた裁判官は自分自身を訓練して、彼の心が小さなすべてのものに気づき、プログラム中に重要なことに集中できるようにすることができます。

TW：コンポーネントは公平に判断されると思いますか、それとも型の「プレースホルダー」として使用されていると思いますか？あなたは前に一日の終わりにそれが基本的にスケーターを比較する方法で使われると述べました。

PI：コンポーネントが公正に判断されたのではなく、それらがプレースホルダーであるためではなく、一部のジャッジが芸術的背景を欠いているだけでなく、平均的なマークのその愚かな回廊に留まることさえあるからです。

TW：では、一部の審査員は5つの要素をほぼ同じレベルでマークし、審査の「廊下」の外に出て、コンテストの終了時に評価を受け、将来の割り当てにつながらない可能性があると思いますか？

PI：確かにそうだ!!!

TW：男子オリンピックのチャンピオンは誰だったと思いますか、そしてその理由は？

PI：私にとって、オリンピックチャンピオンは高橋だったはずだ。彼はそれをすべて持っています！スケートのスキル、カリスマ性、テクニック。彼はジャンプのいくつかに問題があり、一度落ちました！

TW：わかりました、そして彼は彼のベストを尽くすことができなかったので– LysacekまたはPlushenko ?!

PI：高橋！

TW：私はそれについてあなたから回答を得るつもりはないようです。

PI：ああ、リザチェク！それがこれらの2つに来るとき、それについて疑いの余地はありません。でも、その夜、フィールド全体でスケートができたら、高橋。

TW：バンクーバーにいる間に他の試合を見ましたか？2014年のソチ大会に向けて、今後4年間で誰に注目するでしょうか。

PI：私はペアイベントとメンズロングプログラムだけを見ていました。

パトリックチャン、デニステン、フローレントアモディオにご注目ください。ハビエルフェルナンデスも忘れずに！

TW：審査をやめることにした今、あなたにとって今後の予定は？

PI：私はまだ決めていませんが、おそらく全国的または国際的にスケーターにアドバイスをしています。私はまた、新しい裁判官が最初に裁判を始めたときに何を探すべきかを学ぶための裁判官マニュアルをまとめました。100ページ以上あり、審査のさまざまな側面を処理します。また、ベルギーテレビの次の世界選手権でも解説を行います。

TW：お時間ありがとうございます！

Tony Wheeler's Thoughts in the World of Figure Skating

Patrick Ibens Interview

NOTE: This interview as well as many of my other thoughts can now be found at http://tony-wheeler.blogspot.com

Former ISU official Sonia Bianchetti has also submitted her thoughts to me on the below interview, and can be read here: http://tony-wheeler.blogspot.com/2010/03/sonia-bianchetti-role-of-judge-now-is.html

Thank You!!!

“I Would Say 10% of Judges Are Completely Honest”
A Chat with Olympic-Level Figure Skating Judge, Patrick Ibens

Tony Wheeler: Hello Patrick, thanks for taking the time to answer my questions. Can you start by telling me about your own figure skating background, and how long you have been involved in the sport?

Patrick Ibens: This is my 40th year in figure skating. I started skating when I was 5 years old, and was done by the time I was 18 due to a back injury. There were only a few other male skaters here in Belgium: Eric Kroll, Tom Dujardin, Carl Dujardin, Hendrick Sassen, Danny Dillen, Patrick Van Reeth, and myself. Eric Kroll managed something like 24th at Europeans, which was the biggest success any of us had achieved during the time!

TW: What are some of the interntational competitions you have been able to judge throughout the years?

PI: I’ve judged several European, Four Continents, and World Championships, as well as many junior and senior Grand Prix events. I also judged the men’s event in both the Torino 2006 and Vancouver 2010 Winter Olympics. Too bad I only was assigned to judge the short program in Vancouver. [Judging panels are now switched up between portions of competitions. Five of the nine judges from the short program were randomly selected to also judge the long program– Ibens was not chosen.]

TW: Do you judge all four disciplines?

PI: I only judge singles and pairs. No ice dance for me. I’m a REAL judge!

TW: Which means?

PI: That’s an inside joke in figure skating. In ice dance, everyone knows all the results before they even judge, yet they still sit through every practice session from five in the morning until late at night.

TW: So who DOES determine the ice dance results, then?

PI: The previous competitions and the country you’re from play a huge part. And I guess from time-to-time, the quality of skating matters. But that’s not my department..

TW: Do you think that judging has become more or less honest since the 2002 Salt Lake City scandal in the pairs competition, when the French judge admitted she voted a certain way after pressure from her federation?

PI: I think it’s exactly the same as before! No matter what system you use, people will always find a way to cheat. That is why I was glad to have such a great panel for the mens short program in Vancouver. No controversy there!

TW: What percent of judges do you feel are/have been completely honest?

PI: Completely honest? I would say 10%.

TW: Really, that low? Why is that?

PI: Yes, but for different reasons. Judges still afraid of their federations even though they are scoring anonymously now, defending their own skaters (national bias) against skaters ranked close to their own, pushing their favorite skaters, judges afraid of being outside the corridor, trying to push a skater from a country to get invitations to act as a referee in that country, or just simply they don’t know what they are doing!

TW: Have you personally ever been asked to judge a certain way or hold a skater up or down by other judges or federations?

PI: Only once, but it wasn’t at a high level cometition. In fact, it happened at a national championship where a certain skater needed to win to be sent to the Europeans. Even then, I didn’t give in. As a result, I was never invited to judge there again!

TW: And as a retiring judge, you are allowed to share your opinions freely?

PI: I spoke freely while I was judging. It is against the rules to talk about anything regarding the specific event that you are judging while it is happening, but once the event review meeting is complete, you can comment on everything you see to whoever wants to listen. We live in a free world!

TW: Over the years that you have judged, who are your favorite skaters?

PI: Ooh, this is a tough one! Michelle Kwan, Michael Weiss, Jeffrey Buttle, Alexei Yagudin, Jamie Sale/David Pelletier.

And in recent years, Aliona Savchenko/Robin Szolkowy, Patrick Chan, Florent Amodio, Denis Ten, Javier Fernandez, Yannick Ponsero, and Joannie Rochette.

TW: If you are judging those particular skaters in competition, do you find it hard to judge them, or possibly are you more critical of their skating?

PI: To be honest I never had any problems judging them and the only reason is that I know them personally and they know me. They know that they can come up to me with any questions and that I will give them the correct answer and help them wherever I can. However, in the moment that I become the skating judge, all personal relationships vanish and they are become the skaters I have to judge and they will receive the marks they deserve. Good or bad!

That is the only way it works, and they will respect you even more as a judge and person.

TW: What is the single best performance you have ever been on the judging panel for?

PI: “Love Story” by Sale and Pelletier at the 1999 Skate America. It brought me to tears while judging!

TW: So let me ask some questions about the judging itself. What kind of training do/did the judges receive when it came to the new system, which began during select international competitions in the fall of 2003?

PI: We did and still do have ISU seminars where they explain the most recent rules and changes to old rules, as well as what to look for while judging. This includes obvious and subtle errors, downgrades, transitions, and footwork into the solo jump in the singles discipline short programs, plus many other things. They also explain the differences between the short and long programs, and what to look for while judging the components. It is the responsibility of the national federations and the judge him or herself to read about the rule changes, though. They are published on the International Skating Union website in the form of communications, open for everyone to see. At every competition, the judges have an initial meeting where we go over the general rules. An hour before every segment of the competition, we again go over all the rules and the basics about program elements and components.

TW: Can you quickly summarize your own definitions of the five program components in singles and pairs skating? If you know the actual definitions by heart, feel free to use them, but try to put them in your own words rather than cheating! [Note: these answers were given immediately and haven’t been edited from the original quote. I thought it would be more interesting to get immediate definitions rather than giving Ibens time to think about the answers.]

PI: Skating Skills:
1. Flow and effortless glide with deep edges of steps and turns
2. Variety of speed and acceleration
3. Multi directional skating

Transitions:
That there are transitional moves and that there is variety in them. Not always the same movement. A good example is the long program of Stephane Lambiel. He is always doing the same upper body movements as his transitions, even if he has many.

Performance/Execution:
1. The skater gives you the “I am and I am going to be…” feeling.
2. Personality (if you can’t remember a skater’s performance after 5 minutes… he/she doesn’t have any personality).
3. Projection
a) gives you the feeling that he/she jumps into the judges stand/audience.
b) takes you with him/her into his/her own little world.
4. Quality of each movement. Each movement should be done to the end instead of cutting the movement short halfway.

Choreography:
1. Nice programs with beautiful choreography and good lay-out of the entire program.
2. Good use of the music.

Interpretation:
1. If most of the notes are used by the skater.
2. If the music goes up the moves should also lift upward and if the music goes down… the moves should be done downward.
3. When a skater becomes the character.
4. That the skater is interpreting the music instead of putting on a show program.

TW: Based on your own definitions and since you were on the judging panel for the mens competition, which man would you consider the strongest on each of the five?

PI: Skating skills: Takahashi, Transitions and choreography: Chan, Performance: Lysacek, Interpretation: Abbott.

TW: When it comes to Lysacek vs. Plushenko, what do you personally see as the strengths and weaknesses of each skater?

PI: Plushenko is very confident of himself and really believes in what he is doing. His weakness is the fact that he still comes from the 6.0 system and doesn’t calculate every element to be of great importance, but that is the way it’s done now.

Lysacek is a fighter a hard worker. I remember times when I thought… “oh my God, this boy will never make it.. he doesn’t really have it!” So his devotion to do his absolute best really worked in his favor. His weakness? Let me think. Probably his cheating on the triple Axel take-off. Sometimes his skid [or pre-rotation] on take-off is rotated for more than half of a turn making it a triple Salchow, so to speak. But that is the only minor issue I can find with him, and it doesn’t happen all the time.

TW: There have been many comments complaining that Plushenko was not properly rewarded for being able to do the quadruple jump (toe loop) successfully in both portions of the competition, including Evgeny himself. You mentioned that you feel he does not look at the “whole picture” in your last response, so what do you feel about this, and what are your general thoughts about this system?

PI: He did get rewarded for the quadruple toe loop by attempting and landing it, but then he also lost points for the bad landings on the other jumps. If in his opinion only the quadruple jump should have made the difference between first and second place, then I think it’s time that Sweden starts complaining as well because their skater, Adrian Schultheiss, had the best quadruple toe loop of the evening and didn’t even make it to the top five! [Schultheiss was 13th in the free skate]

On the other hand, in most languages figure skating has the word “ART” in it. This means that there has to be something more than just jumping around the ice.

Remember when Katarina Witt won with Carmen? She only displayed two different triples, being a toe loop and Salchow, while others had more difficult jumps. But Katarina was art on ice!

TW: So do you like this system better than the 6.0 system? Explain.

PI: WelI, I have mixed feelings here cause it has both good and bad aspects.

Some of the good aspects are that skaters finally had to work their footwork. Back to clean edges, etc. Also, as I mentioned, they get rewarded for every element and not only for difficult jumps! The negatives are that all elements look alike, especially the spins and steps. There is also not much time left for much creativity.

What I hate the most about this system is that it is made to save the “not-so-good” judges, while the really good judges who are marking the way it’s meant to be (every component separately) risk the chance of being out of the corridor of average marks, and risk getting some assessments. A judge who basically does not know anything can give all the wrong marks or completely guess and their marks fall into an average! But someone who wants to have wide margins between components might be singled out for doing so. For example, when scoring the first three groups at the World Championships, you give between 5.50 and 7.00 and you are in the safe corridor. When the last groups come on the ice, give between 7.00 and 8.50 and you’re safe again!

It also, in a way, took the word sport out of the sport itself! Remember the Battle of the Brians? Well, Boitano added a second triple Axel towards the end of his program which landed him the gold that day. If todays skaters would add something extra, they don’t get any extra points for it cause it will be an element with no value since there are a maximum amount of jumps and spins allowed to be completed.

TW: Back to Evgeny Plushenko. There was a big controversy started when Plushenko apparently commented that he and a fellow competitor (Brian Joubert) did not have “any transitions” because they were too focused on the jumps. Were you in attendance at Europeans for this particular press conference after the event?

PI: No I wasn’t at Europeans, but I heard about it!

TW: What do you think of his comments?

PI: I think it was stupid to do but Plushenko is blond isn’t he? Just joking! I can see what he was trying to explain, but too bad that in doing so, he took Brian Joubert down as well. You don’t do that! I am not Joubert’s biggest fan, but I think it’s the lowest thing an athlete can do is to try to put your fellow competitors in a negative light, so to speak.

TW: Did you receive the original e-mail from Joseph Inman that was published by the French media, and if so, do you think that it had any effect on the way the mens competition was judged in Vancouver? Did it personally effect your judging?

PI: Yes, I did receive the e-mail, but everybody who knows me also knows that I won’t be influenced by such things. This Inman e-mail was the second of the kind sent. Like I said at the beginning, I’m a real judge and I don’t need anybody’s e-mails or comments to make up my mind! I’m perfectly capable of doing that for myself. I judge what I see no matter who it is!

TW: You were on the panel in the short program. How did you think Plushenko and Lysacek skated in that particular portion? Also, how would you have scored Daisuke Takahashi, as the top three were separated by less than a point in that portion of the competition? Anyone that you thought was held up or down, way different from how you judged?

PI: Right after the completion of the short program I could see that the top three skaters were less than a point apart. I knew that the panel had done a great job! During that segment, those three were equally good and all for different reasons. However, in my opinion, I thought Takahashi would have won the short program but anything is possible now since we as judges don’t know what the technical panel has decided on as far as levels and downgrades. We also do not know our previous marks so it is possible that you accidently gave the higher mark to the other skater although as a good judge you have your ways to get around that!

TW: Please explain the last part in more detail.

PI: If i gave, say, a 7.25 to skater A, then skater B is ten skaters later and in my opinion he is better, but gave him a 7.00, then I accidently gave “first place” to the wrong skater for that component.

But, as a good judge you add up all of your components for the first skater, which we will say makes an avarage 7.00. Then you remember that and when the next skater comes on you do the same. If he is better all-around on the components, your average should obviously be higher than 7.00!

TW: But this system is supposed to make it so that you are marking a skater against a 10-point scale, not pitting the skaters against each other. Do you think the reason that happens is because there is too much going on at once?

PI: Yes and no. As i said, our sport is out of the “sport”. Sport is that one is better than two is better than three. You can only come up with that result by comparing. If you call it speed skating, then there’s a clock but of course there are no marks for skills and artistry– they only have to be fast! Scoring only on a scale of 10 is impossible in a competition.

TW: Do you or many of the other judges watch practices, including skater run-throughs to get an idea of their abilities on the components marks?

PI: I personally go to one practice just to get myself in the right mood to focus on the job that I have to do. I never watch the exact details of what the skaters are doing. They do different things when they are under stress of a competition, anyways! But there are some judges that go watch every practice. Don’t ask me why but they do. Since I never watch in detail I can’t get a feeling for the components either during a practice session.

In real time it’s sometimes hard cause you have to focus on so many different things. The elements, the mistakes, the rules and than the five components. This is sometimes hard to do but a good judge can train himself so that his mind picks up on all the smaller things so you can focus on the important things during a program.

TW: Do you think that the components are judged fairly or used as a “place-holder” of types? You mentioned before that at the end of the day it is basically used in a way to compare the skaters.

PI: I don’t think that the components are judged fairly and not because they are a place-holder, but because some judges are not only missing artistic background, but even more to stay in that stupid corridor of average marks.

TW: So you feel that some judges mark their five components on a generally similar level so that they won’t be outside of the judging “corridor” and have an assessment at the end of the competition, possibly leading to no future assignments?

PI: They certainly do!!!

TW: Who do you think the Olympic mens champion should have been and why?

PI: For me the Olympic champion should have been Takahashi. He has it all! The skating skills, the charisma, the technique. Too bad he had problems with some of his jumps and fell once!

TW: Okay, and since he wasn’t able to perform at his best– Lysacek or Plushenko?!

PI: Takahashi!

TW: I see that I’m not going to get an answer out of you on that one.

PI: Oh, Lysacek! There’s no doubt about it when it comes down to those two. But if it came to the whole field skating great that night.. Takahashi.

TW: Did you watch any of the other competitions while in Vancouver? Who do you think to look out for in the next four years leading up to the 2014 Sochi Games?

PI: I only watched the pairs event and the mens long program, since I was not assigned to judge that.

Look out for Patrick Chan, Denis Ten and Florent Amodio. And don’t forget Javier Fernandez, either!

TW: What is in the future for you now that you’ve decided to stop judging?

PI: I haven’t decided yet, but probably giving advice to skaters nationally/internationally. I also put together a judges manual for new judges to learn what to look for when they first start to judge. It contains more than 100 pages and handles all different aspects of judging. I will also be doing commentary at the upcoming World Championships for Belgian television.

TW: Thank you for your time!

2020-04-07

フィギュアスケート男子シングルにみるジェンダー・クロッシング

オリンピックジャッジへの疑問について芸術性を考える

フィギュアスケート男子シングルにみるジェンダー・クロッシング

―21 世紀初頭のオリンピックにおけるパフォーマンスから―
　　　　　　　　　　　　　　　　　

　　　　　　　　　　　　　　　　　　　　　　　　　　相原夕佳
　　　　　　　　　　　　　　　　　　　　　　日本大学大学院総合社会情報研究科

https://t.co/tanXQNfSOV

この論文はジャッジの判定について、とりわけ

ジェンダーに関するバイアスという視点で書かれたものです。

「芸術とスポーツの融合であるフィギュアスケートは、スケーターたちのアスリートとしての卓越した身体能力と芸術性の高い表現力によって多くの観客に感動を与え、社会的にも影響を及ぼしてきた。しかしまた、自由な表現を阻む様々なイデオロギー・バイアスが存在したことも事実である。そして、その中でも特に大きな抑圧を及ぼしたのは、社会が規定する男性性、女性性を表現することを要請するジェンダー・バイアスであろう。」

「フィギュアスケート界は依然として保守的であるが、確かな変化が起こりつつある。その契機をもたらしたのは、21 世紀初頭に圧倒的な存在感を示した二人のスケーター、エフゲニー・プルシェンコ（Plushenko, Evgeni, ロシア）とジョニー・ウィアー（Weir, Johnny, アメリカ）である。本稿では、彼らをはじめとする男子シングル選手のオリンピックでの演技に着目し、21 世紀のフィギュアスケートにおけるジェンダー・クロッシングの意義と課題についてパフォーマンス・スタディーズの視座で考察する。」

という問題を提起し、論文の中には重要な人物としてプルシェンコとジョニー･ウイアーが大きく取り上げられています。そして、ジェンダー・バイアスに対して、技術面で女性的とされる柔軟性を取り入れたプルシェンコ、男性女性を越えて芸術性を高めたウイアーの演技のについてジェンダーに捕らわれた価値観を変革してきた存在として高く評価しています。さらに、論文の最後近く123ページでその後継者としての羽生選手のソチオリンピックでのロミオとジュリエットの演技にも触れています。

北アメリカが特に男性性を重視しジョニー・ウイアーへの評価を不当に低く抑えてきたということはよく言われるところですが、歴史的な経緯を踏まえてこの問題が書かれていること、男子シングルでのジェンダーのとらえ方の変化についても考察されていて興味深い論文だと思います。

2020-04-01

ISUレフェリー研修について

ジャッジへの疑問について

ぶろぐ「ロンドンつれづれ」さんより

https://ameblo.jp/popular2/entry-12491026428.html?frm=theme

シーズン初めにレフェリーを対象として作られた研修用のビデオの内容について「ロンドンつれづれ」のポプラさんが翻訳して紹介してくださいました。今シーズンの採点の傾向によい影響があると期待したものでしたが……。

◇レフェリー研修

2019年07月07日
テーマ：スケート

ISUが、レフェリーのための研修ビデオを作ったようだ。これはジャッジのためではないが、同じようなビデオをぜひジャッジ用にも作ってほしい。

その中で、PCSの点数のつけ方について、「ありがちな採点の間違い」として、以下のような点を挙げている。レフェリーは、ジャッジの採点について、10分ほどのミーティングができるから以下の点をチェックすることができる、とスピーカーが話している。　以下は、してはいけないミステイクの箇条書きであるが、全部は書ききれないので、いくつかをご紹介したい。

（一番気に入っている、あるいは記憶に残る）たった一つのプログラムを基準に「平均」を決めること
プログラムの4回転ジャンプの数で、PCSを高くしてしまうこと
滑走順でPCSを評価すること
前に滑ったスケーターの点数で次のスケーターのPCSを評価すること
滑走の向き、片足滑走を評価することを忘れること
滑走のスピードとパワーだけを高く評価すること
演技の後半のステップシークエンスでの深いエッジ、ステップ、ターンのみを評価すること。これらは演技のすべてにおいてみられるべき！
プログラムの一部でもトランジションがない、あるいはほとんど見られないこと（例えば4回転の前など）を見落とすこと
スケーティングや体の動きに多様性がないのを見落とすこと
体の動きについての難易度を見落とすこと（難しいヴァリエーションやターンだけではなく）

スピーカーはまた、どのブレットポイントがPCSのどの項目に来るかの説明も。　例えば、involvement(プログラムへの強い関与）はパフォーマンスの項目に、Musical Sensitivity(音楽への感度）はInterpretation（解釈）の項目として評価するように、間違えないように、との指導が。　ジャッジパネルに対して明確に説明できるように。　我々は国も文化も専門も違う人たちのあつまりですから、どのようにしてその評価に至ったかを話し合ってみてください。お互いから得るものがあるでしょう、と。　

「ベテランのジャッジも新しいジャッジの意見を聞いて得るものがあるはずです。　As a team, to provide most honest, correct and acｃuｒate result, with integrity with your knowledge....「（チームとして、あなた方は正直かつ間違いのない、正確な結果を出さなくてはいけません。知識と誠実さをもって。ジャッジは90秒で審査をしなくてはいけません。採点を出した時、ジャッジングパネルは正しい知識と専門性の上で、その演技に見合った正確な点数をつけるという正しい判断をした、と我々が信頼できることが必要です。」とスピーカーは話した。

ここで上げているポイントは、今年に入っての試合での審査で、ファンがイライラしたポイントでもある。　ということは、実はISUもジャッジたちのPCS審査のおかしさは気が付いていた、ということなのである。ファンが「この審査、大丈夫？」と信頼することができないような一貫性の無いジャッジングでは、審査員の「知識」や「正直さ、誠実さ」を疑わざるを得ない…。ある意味、ISUがそれを認めたということになるのでは？

しかし、こうやって、研修ビデオを作り、ジャッジたちの「犯しがちな間違い」を箇条書きにして注意喚起しているということは、ISUの採点ルールが詳細にあっても、ジャッジたちはそれが頭に入ってないんじゃないか、ということを心配しているからだろう。「間違い」かどうかは議論の余地があるにしても、ファンもISUの努力は認めたいと思うのである。

しかし、これはジャッジ用ではなく、レフェリー用のビデオである。採点をするのはレフェリーではないし、試合が始まってしまっては、90秒で採点は行われてしまうので、実際のところレフェリーの出る幕はあまりないだろう。　なので、ぜひジャッジたちに同じ箇条書き項目をしっかり見せて、頭に叩き込んでおいてもらうことが必要である。　

そして、PCSだけでなく、TESについてもしっかりと採点のブレットポイントを明確にし、「間違いポイント」をこのビデオのように抽出して、注意喚起してもらいたい。

その上でまだ変な審査をするジャッジは、それはもう間違いではなく、確信犯だということになる・・・。

このビデオの中でも、honest,（正直、誠実）という言葉を使い、trust　（信頼）という言葉を使ってスピーカーは話している。　ジャッジの中には故意に不正と思われる採点をしているとISUが考えているということだろう。　あるいはファンの声が聞こえているのかもしれない。　採点スポーツとして、審査が不正だと思われてしまえば、ファンは離れてしまう。　そうなれば、観客やスポンサーの数は減ってしまうであろう。

大きなルール改正があった昨季の審査は、やたらに回転不足に厳しいシーズン序盤から、ダウングレードのジャンプまで「見落として」GOEをつけるような国別で終わるという一貫性のないものだった。

ビデオを見る限り、一応ISUもジャッジの質の向上をめざす努力をしているように見えるので、期待を裏切らない採点が行われることを祈りたいと思うのである…・。

2020-04-01

【翻訳】フィギュアスケートの採点調査　　　第一部ナショナルバイアス

ジャッジへの疑問についてデータによる分析

フィギュアスケートの採点調査

偏向採点とフィギュアスケート：第一部：ナショナル・バイアス

投稿者：FS Judging Review

翻訳者：翻訳班　MK

シニアグランプリシーズンがもうすぐ始まる今、ジャッジの記録を体系的に見るのにふさわしい時期だと考えました。フィギュアスケートのジャッジ達は、若い選手の今後の経歴を決定する上で重要な責任を負っているにもかかわらず、ほとんどその責任を果たしていないことはよく知られています。採点基準がしっかりと文書化されて説明されているにもかかわらず、偏向採点がはびこり、処罰されるのは最も悪質なものだけです。また、国際スケート連盟（ISU）は、ジャッジの採点の細かい記録を把握しているようには見えません。そこでジャッジは、個々の試合では極端な偏向を見せずに、すべての試合で、度を越さない範囲内の偏向におさめて、調査の目をすり抜けています。

そこで私は、ジャッジの責任問題を改善するために、データを自分で集　めて分析することにしました。今シーズンは、いくつかの問題を検討するつもりです。1．ナショナル・バイアス（国家主義的偏向）。　2．ブロック・ジャッジング。ジャッジが、特定の国のスケーターに有利になるように偏向採点すること。例えば、ロシアのスケーターを有利なるように採点する、旧ソ連諸国のジャッジに見られるようなもの。　3．ライバル国のジャッジによる、トップ選手への下げ採点。例えば、アメリカのジャッジは羽生結弦に低い点をつけているでしょうか？日本のジャッジはネイサン・チェンに対してどうでしょう？ロシアのジャッジは紀平梨花に対してどうでしょう？など。私の願いは、ジャッジに説明責任を持たせることに加えて、ファンが長い間抱きつつも、はっきりとした証拠がなかった疑念を払拭することです。

まず最初に、ナショナル・バイアスについて見ていきます。統計的に見て、どのジャッジがナショナル・バイアスの証拠を示しているかや、それをどのように判断するかについて説明し、懸念されることとその限界について述べます。

第一部：ナショナル・バイアス

まず最初に私は、skatingscores.comから入手可能なデータを使用し、それを数式によって処理して、各ジャッジが各スケーターをどのように採点したかを確認しながら、ISUチャレンジャーシリーズ、グランプリシリーズ、チャンピオンシップを採点した、国際レベルの上級ジャッジの採点について、採点記録のスプレッドシートを編集しました。また、他の試合、2019年チャンレンジ・カップとワールド・チーム・トロフィーも加えました。（注：この投稿の公開後、判定基準により一貫性をもたせるために、次のアップデートの際にはその２つを削除することにしました。この２つはその試合の重要度を誤解していたため、今回のバージョンには含まれています。）こうすることで、一部のジャッジにフラグが立てられるかもしれません。

現在、データベースには312人のジャッジがおり、そのうち177人に対して、ナショナル・バイアスがあるかを調べました。（他のジャッジには判定するのに十分な記録がありませんでした。）その中の92人から、ナショナル・バイアスに関して統計的に有意な証拠が見つかりました。さらにその中の74人から、重大な証拠が見つかりました。おおむね、ジャッジは他国のスケーターとは全く違うやり方で自国のスケーターを採点していました。たとえば、自国のスケーターと他国のスケーターの採点の平均Zスコア（これについては後述）の表をつくると、次のようになります。

ご覧のとおり、全く異なる２つのパターンがあります。

誰がこういう採点をしているのでしょうか。次のリストでは、ジャッジを連盟ごとに分けてあります。統計的に基準を満たしている者（p <0.05）、より多くの基準を満たしている者（p <0.01）は太字にし、さらに多くの基準を満たしている者（p <0.001）は太字と下線を付けてあります。連盟名の横のカッコ内の数字は、『調査済み』のジャッジの数です（登録してあるジャッジの総数ではなく、データの不足により調査していない場合もあります。）偏向採点がはっきりと証明されるジャッジがいない連盟は、リストに入っていません。

偏向採点の記録を持つジャッジ

Austria (3)	Adrienn Schadenbauer
Canada (23)	Andre-Marc Allain, Cynthia Benson, Leanne Caron, Reaghan Fawcett, Karen Howard, Leslie Keen, Patty Klein, Nicole Leblanc-Richard, Erica Topolski
China (5)	Dan Fang, Shi Wei, Fan Yang
Czech Republic (7)	Frantisek Baudys, Jana Baudysova
Spain (2)	David Munoz
Finland (7)	Merja Kosonen, Virpi Kunnas-Helminen, Leo Lenkola
France (8)	Ronald Beau, Jezabel Dabouis, Elisabeth Louesdon, Philippe Meriguet, David Molina, Florence Vuylsteker
Great Britain (5)	Christopher Buchanan, Stephen Fernandez, Sarah Hanrahan, Nicholas Russell
Georgia (1)	Salome Chigogidze
Germany (13)	Christian Baumann, Ulla Faig, Uta Limpert, Claudia Stahnke, Elke Treitz, Ekaterina Zabolotnaya
Hungary (2)	Attila Soos, Gyula Szombathelyi
Israel (2)	Anna Kantor, Albert Zaydman
Italy (11)	Matteo Bonfa, Rossella Ceccattini, Raffaella Locatelli, Isabella Micheli, Tiziana Miorini, Miriam Palange, Walter Toigo
Japan (16)	Miwako Ando, Tomie Fukudome, Ritsuko Horiuchi, Akiko Kobayashi, Takeo Kuno, Kaoru Takino, Sakae Yamamoto, Nobuhiko Yoshioka
Kazakhstan (2)	Yuriy Guskov, Nadezhda Paretskaia
South Korea (2)	Sung-hee Koh, Jung Sue Lee
Latvia (1)	Agita Abele
Lithuania (1)	Laimute Krauziene
Mexico (1)	Sasha Martinez
Poland (3)	Malgorzata Grajcar, Malgorzata Sobkow
Russia (14)	Maira Abasova, Julia Andreeva, Sviatoslav Babenko, Igor Dolgushin, Elena Fomina, Maria Gribonosova-Grebneva, Natalia Kitaeva, Olga Kozhemyakina, Lolita Labunskaiya, Igor Obraztsov, Tatiana Sharkina, Alla Shekovtsova
Switzerland (3)	Bettina Meier
Sweden (4)	Inger Andersson, Kristina Houwing
Ukraine (2)	Yury Balkov, Anastassiya Makarova
USA (21)	Samuel Auxier, Richard Dalley, Janis Engel, Kathleen Harmon, Taffy Holliday, Laurie Johnson, Hal Marron, Jennifer Mast, John Millier, Sharon Ro

太字になっていないジャッジが多いことに注目してください。彼らについては、より多くのデータが入って証拠が増えてくると、フラグが立てられると思います。また、調査の最低基準を十分に満たすと、すぐにフラグが立つジャッジも数多くいます。一方で、調査したジャッジの数が非常に多いため、運悪くこのリストに載っているジャッジがいる可能性もあります。（主に非太字の者。太字、太字＋下線の者がそうである可能性は極めて低い。）（しかし、このリスト内のジャッジが、実際に公正である可能性はかなり低い。）一般的に言って、フラグを立てるべきジャッジを見逃した確率は、フラグを立てるべきでないジャッジにそうした確率よりも、かなり高いです。

この決定が平均値を基準として行われていることを知っておいてください。ここで偏向採点をしているとマークされた人は、偶然とは言い難い程度に自国のスケーターを上げ採点しますが、それは彼らがすべての試合で、自国のスケーターの一人一人を上げ採点するという意味ではありません。個々の採点はさまざまな要因の影響を受けるため、ジャッジによってかなり異なります。結局のところ、ジャッジには鋭い洞察力があるわけではなく、他のジャッジがどのように採点するかを予測できません。（逆に、採点記録を調べる時は、１つの採点が他ジャッジと異なり、ナショナル・バイアスと一致しているように見える場合があることにも注意しておかねばなりません。そのジャッジが常に偏向しているとか、採点がナショナル・バイアスゆえであるとは言い切れません。）

連盟同志を比較することもできます。ここでは、少なくとも10人のジャッジの連盟を検討し、ジャッジがこれまでの試合で偏向していた程度（ZDifference）を調べます。次の棒グラフは、各連盟のジャッジの何パーセントが、4つの異なるカテゴリーに分類されているかを示しています。具体的には、バイアスなし（0以下）、低バイアス（0-0.5）、中バイアス（0.5-1）、高バイアス（1以上）。このグラフには、情報量が少ないジャッジが含まれることに注意してください。

ご覧のように、カナダや、似た傾向をもつ連盟はありますが、他の連盟（主にロシア）の場合、すべてのジャッジは母国のスケーターを支持しています。ロシアは調査対象の連盟の中で最も高い偏向が見られますが、他の連盟では、偏向の度合いが高いジャッジの割合が多いです。

これらの大きな連盟のうち、オーストラリアだけが、あらゆる根拠に基づいて公平であるように見えます。これは、次のグラフをよく見てみることでより明確になります。これは、前のグラフとわずかに異なる方法で、各連盟のジャッジのZDifferenceの分布を示しています。これまでこのグラフを見たことがない人は、箱型部分が各連盟のジャッジの中央値50％を示していると理解してください。線は各連盟のジャッジの全範囲に伸びています。点では連盟内での外れ値です。

もちろん、これらの数字はシーズンが進んで、より多くの試合がデータに追加されるにつれて変化します。時間の経過とともに、この投稿を更新していくつもりです。（この図は2019年10月17日現在です。）私がどう決定をくだしたか（およびデータを読み取る方法）を理解するために、方法論を述べます。この考察には、おそらく多くの人に馴染みのない統計学の概念が関係するため、説明を加えると共に、ここで使われている統計学的概念へのリンクを提供しようと思います。　私が作成したわけではなく、あらゆる種類のデータを分析するために使用される、標準的な統計ツールを用いました。

方法論

ジャッジの偏向を決定する基本的な考え方は、各ジャッジを選出し、各スケーターの得点を他ジャッジの得点と比較し、ジャッジ全体からどれだけ乖離した得点を出しているかを定量化し、すべての数字を1か所にまとめることです。そうすることによって、特定のジャッジが他のジャッジと比較して、自国のスケーターをどのように採点したかを見ることができます。次に、ジャッジの記録が偏向を示すはっきりした証拠を把握するために、その違いについて標準的な統計テストを実行しました。

これを行うために、2018年に採点システムが変更されて以来、チャレンジャーシリーズ、グランプリシリーズ、ISUチャンピオンシップのすべての大会で、すべてのジャッジが出した得点を記録することから始めました。また、チャレンジ・カップとワールド・チーム・トロフィーも含めました。（注：前述のように、これらはアップデートで削除し、基準をより一貫性のあるものにするつもりです）。Skatingscores.comによって各ジャッジの採点を検索し、スプレッドシートに手動で入力していきました。試合のスプレッドシートはこちらでご覧いただけます。

スプレッドシートを開くと、次のように表示されます。

		Odhran Allen	Doug Williams	Maria Fortescue	Veronique Verrue	Andreas Waldeck	Lorna Schroder	Miwako Ando	Mean
		IRL	USA	ISL	FRA	GER	CAN	JPN
Yuzuru Hanyu	JPN	164.01	157.34	172.44	161.55	166.7	167.4	170.74	165.74

（2018 ACI男子フリー・スケート。この例は、全くランダムに選びました。）

ご覧のように、ジャッジ、ジャッジの国籍、スケーター、スケーターの国籍、各ジャッジが与えた得点、すべてのジャッジの得点の平均が表示されています。（数字を確認したい場合は、skatingscores.comから取得できます。このウェブサイトの存在により、自分で計算する必要がなかったため、全体のプロセスがはるかに速くなりました！）

このデータから、私はまず、個々のジャッジの得点をすべての得点の平均から差し引くことにより、各ジャッジが他ジャッジと比較して、スケーターにいかに高い、あるいは低いスコアをつけているかを測りました。これにより、スコア偏差と言えるものを作りました。ここでユヅを例とすると、次の結果が生まれます。

		Odhran Allen	Doug Williams	Maria Fortescue	Veronique Verrue	Andreas Waldeck	Lorna Schroder	Miwako Ando
		IRL	USA	ISL	FRA	GER	CAN	JPN
Yuzuru Hanyu	JPN	-1.73	-8.4	6.7	-4.19	0.96	1.66	5

すると、オドラン・アレンは他のジャッジよりも1.73ポイント下、ダグ・ウィリアムズは8.4ポイント下の採点をした、等のことがわかります。これは、試合シートの2番目のブロックに示してあります。残念なことに、このデータだけに頼るわけにはいきません。というのも、最も健全なデータを取得するためには、ジャッジが各カテゴリーで、どのようにスケーターを採点するかを比較したいからです。-8.4という数字は、男子のフリーでも大きな差ですが、ショートでは絶大な差であり、男子よりも女子やペアではさらに大きな差となります。したがって、これらのスコアの偏差をより比較可能にするために、それらをZスコアに標準化する必要がありました。これは、データを標準化する一般的な方法です。計算方法は次のとおりです。まず、ジャッジのスコアの標準偏差を決定する必要があります。標準偏差は、一般的な統計的尺度の1つであり、一連の数値が平均からどれだけ広がっているかを示します。そのため、ジャッジの採点の幅が広くなる場合、標準偏差は比較的高くなりますが、採点がほぼ一致する場合には標準偏差は低くなります。この場合、ユヅのスコアの標準偏差は、男子ではかなり典型的な4.85でした。

Zスコアを計算する方法は、それぞれのスコア偏差をユヅのスコアの標準偏差で割るだけです。Zスコアをわかりやすく説明するならば、ジャッジがスケーターを平均より上、または下に採点した標準偏差の数です。ユヅの採点がZスコアに変換されると、スコア偏差はどうなるでしょう。

		Odhran Allen	Doug Williams	Maria Fortescue	Veronique Verrue	Andreas Waldeck	Lorna Schroder	Miwako Ando
		IRL	USA	ISL	FRA	GER	CAN	JPN
Yuzuru Hanyu	JPN	-0.36	-1.73	1.38	-0.86	0.2	0.34	1.03

Zスコアは通常-2～2の範囲内ですが、ジャッジが他のジャッジと『本当に』意見が合わない場合、その範囲外の数字になることがあります。（これは約5％の頻度で発生します）。下げ採点（他のジャッジより低い採点）はマイナスのZスコアになり、上げ採点（他のジャッジより高い採点）はプラスのZスコアになります。Zスコアを使用すると、実際にはジャッジの寛容さを示すことになります。ジャッジ内で採点について意見の相違がある場合、Zスコアは実際のスコアの差よりも極端ではないため、平均値との大きな差は「カウント」されません。一方で、ジャッジ内でおおむね合意がある場合に、一人だけそこから外れている人は、実際のスコアの差と比べて、より極端なZスコアを示す可能性があります。しかし全体としては、Zスコアにより、偏向した採点を見つけにくくなります。まあそもそも、そういうジャッジが目立たないというわけではありませんが。

これらのZスコアが各ジャッジ、試合、区分ごとに計算されると、特定のジャッジのすべてのZスコアが1つのシート（個々のジャッジのシート）にまとめられます。これらはジャッジの大きなデータベースにあります。シートでジャッジの名前をクリックすると「ジャッジ」というラベルが付いていて、クリックしたジャッジの個別のシートに移動し、採点したすべての試合でのZスコアを確認できます。

左側には、統計一覧をまとめたものが表示され、右側にZスコアが表示されます。ご覧のとおり、それらはスケーターと国籍によってラベル付けされており、上部には、どの試合とどのカテゴリーか表示されているコードがあります。これは、[年] [試合コード] [カテゴリーコード]で構成されます。試合コードのキーは、ジャッジシートの「チェックリスト」部分にあり、どの試合がデータベースに含まれているかもリストになっています。

このデータは数式によって２つのグループに分割されます。自国のスケーターに対するZスコアと、他国のスケーターに対するZスコアです。自国のスケーターのZスコアは平均化され、左側にZ-homeが作成されます。他国のスケーターのZスコアについても同じで、Z-otherが作成されます。この２つの違いはとても興味深いもので、ZDifferenceとして計算されます。(これらの指標はジャッジ全体の概要にも表示されます）。

ZDifferenceは、ジャッジが自国のスケーターに与えた偏向の度合いを表すものと考えられます。経験則からして、実際のスコアでは、ZDifferenceが１の場合、１試合で男子で約7～8点、女子とペアで6点、アイスダンスで6～7点になります。言い換えると、ZDifferenceが１のジャッジは、自国出身でないスケーターと比べて、男子などで平均して7～8点のボーナス得点を与えているということです。

もちろん、ここでの偏向の度合いは、ジャッジの偏向の大きさを判断するための唯一の資料ではありません。あるジャッジのZDifferenceが１だとして、自国のスケーターを数回しか採点していない場合、ZDifferenceは単に偶然か、その他の要因によるものである可能性があります。一方、ZDifferenceが多くの試合の採点結果である場合、ジャッジが偏向していると確信できます。

ここで、pメトリックの出番になります。pは別の標準的な統計測定法です。データの状況について、pによって、偏向採点をしないジャッジ、つまり、自国のスケーターを他国のスケーターと同じように採点するジャッジが、実際の採点記録と同等かそれ以上の偏向を示す可能性があります。つまり、p値が低いほど、ジャッジが母国のスケーターと他国のスケーターを採点する方法に、何らかの体系的な違いがある可能性が高くなります。

慣例により、0.05未満のp値は統計的に有意であると見なされます。これは、ジャッジにフラグを立てるために使える基準ですが、多くの場合、pはそのしきい値をはるかに下回ることがあります。たとえば、ロシアのジャッジ、Olga Kozhemyakinaの場合、p値は0.000000000000003です。（公平なジャッジが彼女と同じか、それより劣った採点記録を残す可能性は0.0000000000003％であることに注意してください。）

ただし、統計的有意性が２つあるとは考えない方がよいでしょう。p値が低下すれば、ジャッジに対してより不審感が増します。しかし採点記録にフラグが立っていないジャッジの多くは、p値がかなり低いことには注目してください。これらのジャッジの採点記録がより多く入力されると、フラグが立てられると思います。

ZDifferenceとp値の両方を考慮することで、ジャッジの採点記録に対して、完全な評価を行うことができます。ZDifferenceは過去の偏向採点の重大度を示し、p値は偶然に生じた確率を示します。

２つの平均値の違いに対して、私は標準統計テスト‐ウェルチのｔテストを使ってp値を計算しました。一方向への偏向だけを探る目的で、テストの片側バージョンを使用しました。多くの採点記録を持つジャッジは、自国と他国のスケーターの得点に不一致がある傾向があることに気づいたため、学生のｔテストではなくウェルチのｔテストを使いました。（残念ながらこの段落の内容を理解できなくても、問題ありません。p値の計算がいかに詳細に機能するかについては、さらに多くの説明が必要ですから、私にはそれをやり続ける必要があります。自ら学びたい場合は、統計の入門クラスを受講することをお勧めします。）

考察

そもそも、偏向とはどういう意味をもつのでしょうか？私は、実証可能な数学的な差異を用いて、ジャッジが自国のスケーターと他国のスケーターとの間に示す偏向を調べてきました。私は、あるジャッジが「偏向している」と指摘することによって、その人の心理と関連付けるつもりも、偏向採点した理由について何かを主張するつもりも全くありません。意識的なものか、無意識にやったのか、わからないからです。それは結果を操作するための意図的な企みかもしれないし、あるいは、ファンが好きなスケーターのスケートに関してしばしば見せるような、客観性の欠如でしかない可能性があります。個人的には、偏向の原因についてではなく、それが存在することについて心配しています。ジャッジの得点によって、若い選手の未来が決まることを思い出してください。皆さんはどうかわかりませんが、若くて信じられないほど熱心な人々の未来が、客観的でないグループによって決められたくありません。その客観性の欠如が、意図的に腐敗した結果なのか、単に判断が曇った結果なのかに関わらず。

ただし、少なくとも問題があるとは思えない偏向について、いくつかの具体的な説明をしてみましょう。ジャッジを批判から守るためにも。

偏向は、単に文化的な好みによるものです。人は文化的に馴染みのあるプログラムをより好意的に見て、より高い得点をつける傾向があります。そして、同じ国のジャッジとスケーターが、互いを文化的に理解できるのは明らかなことです。

まず第一に、同じ国のスケーターが全く異なるスタイルのスケートをして、全く異なった種類のプログラムを滑ることはよくあります。そのため、「ロシア人」や「カナダ人」、その他同じ国籍のスケーターついて、何か典型的なものが存在すると考えると、信用問題になります。サーシャ・トゥルソワ、アリーナ・コストルナヤ、アリーナ・ザギトワのそれぞれのプログラムは、全く異なっていますが、同じ人々によって指導され、振り付けさえされています！

第二に、文化的に類似した国のジャッジが、互いの国のスケーターにより高い得点を与えることと仮定します。たとえば、カナダと米国は文化的に非常に似ているため、カナダのジャッジは米国のスケーターに高い得点を与えるはずで、その逆もしかりです。幸いなことに、シートはこの問題を簡単にテストできるようになっており、（各ジャッジの個々のページ内の国コードを変更するだけで、特定の他国のスケーターの得点を確認できます）実際、そういうことは見当たりません。カナダのジャッジの大多数は、米国のスケーターを公平に採点しています。（このシートをダウンロードして、各ジャッジの国籍を、ある程度文化的に類似した、または地理的に近い国の国籍に切り替えることができます。または、元のシートでジャッジの国籍を自分で切り替えると、各ジャッジのすべての統計が再計算されます（ただし、概要シートではなく、各ジャッジの個別のシート内でこれを行う必要があります）。

これにはいくつかの例外があります。旧ソ連諸国のジャッジは、ロシアのスケーターを高く評価する傾向にありますが、偏向の程度は、自国のスケーターに対してほど深刻ではありません。また、韓国のジャッジは北朝鮮のスケーターに高得点をつけると考えていますが、この２つの国が類似の文化を持っているかには、かなり議論の余地があります。これについては、今後の投稿でさらに詳しく説明しますが、例外についてはもっと良い説明ができると思います。一般的に、文化的に類似した国は、お互いの国のスケーターにより高い得点を与え『ません』。

偏ることは人間の本性です。ジャッジも人間であり、ロボットではないことを理解する必要があります。

ジャッジは皆同じではありません。すべてのジャッジが、偏向採点した証拠を持っているわけではありません。たとえば、グレン・フォーティン（カナダ）、カタリーナ・ホイジンガー（ドイツ）、アンドレアス・ワルデック（ドイツ）、小塚あゆみ（日本）、宇垣静子（日本）、リンダ・リーバー（アメリカ）には、合理的な審査記録があります。少なくともナショナル・バイアスに関しては、ジャッジが公平であり得ることを明確にしています。すべての人がこの特定の分野で欠陥を持っているわけではありません。偏向採点をしているジャッジの間でさえ、その度合いにはかなりのばらつきがあります。最も悪質なジャッジ、たとえばサロメ・チゴギツェ（ジョージア）、ニコラス・ラッセル（イギリス）、エレナ・フォミナ（ロシア）は、1.5～2のZDifferencesを持っていますが、統計的に有意な最小値は0.5内にあります。このことは、悪質なジャッジを排除し、偏見の少ないジャッジに置き換えることにより、ジャッジ全体の偏向のレベルを確実に減らすことができることを示しています。低レベルの偏向（たとえば0.5未満）を取り除くのは難しく、対処するには現実的ではない場合もありますが。

ジャッジの偏向を、ジャッジ全体の平均と比較することが指標になります。それはジャッジの平均値が正しいと仮定していないでしょうか？しかし時として、外れ値のジャッジが正しくて、他のジャッジが間違っていることもあります。

外れ値のジャッジが実際には「正しい」ことは事実かもしれませんが、スケーターが「客観的に」採点された値を評価することは避けました。というのは、こういった評価は非生産的なファンの争いにつながり、私がこの研究で示したくないと個人的に判断するからです。ただし、外れ値のスコアは、予想されるナショナル・バイアスのパターンと一致する場合にのみ、ジャッジに対して「カウント」します。日本人ジャッジがフィリピン人のスケーターを平均よりも高い点数で採点したとして、そのジャッジだけが客観的であり、他のジャッジは別の偏見（評判、小国など）のために下げ採点をした場合、実際には、日本人ジャッジに有利な形で、わずかにカウントします。日本人ジャッジが日本人スケーターの得点を平均より高くした場合のみ、そのジャッジが「偏向している」とカウントします。しかしその場合、少なくとも他の３つのデータのポイントを考慮すべきです。（ジャッジが自国のスケーターを少なくとも４回採点した場合にp値の計算を開始します。）もし、ジャッジが自国のスケーターのスコアを「修正」するパターンのみを示している場合、本当に客観的であるかどうかを疑問に思う必要があります。繰り返しますが、ジャッジは平均から外れたスコアをつけたからといって「偏向」とラベル付けされません。自国のスケーターと他国のスケーターの得点に差がある場合、そのジャッジは「偏向」とラベル付けされます。自国と他国に、平均より２標準偏差分高い得点をつけたジャッジは、たまたま他のジャッジと異なる採点をしたとしても、フラグを立てられることはありません。重要なのは、自国のスケーターと他国のスケーターに対する、ジャッジ自身の採点の違いだけです。

実質的に関心がある唯一の状況は、１人か２人しか自国のスケーターを採点していない小さな連盟のジャッジの場合です。その場合、個人的な好み、あるいは特定のスケーターが下げ採点されているという強い信念から、たまたま、ナショナル・バイアスとして「間違って」フラグが立つ場合があります。これを考えると、小国のジャッジに対してもう少し寛大になるべきかもしれません。ただし、この弁護は、ロシアや米国などの大規模で強力な連盟のジャッジにはほとんど適用されません。彼らは自らの採点履歴を通して自国の多くのスケーターを採点しており、そのスケーターは国籍のために下げ採点されるとは言えないからです。

限度/その他の考察

非常に多くのジャッジが、ここで使用されている方法によってナショナル・バイアスの証拠を示している事実にもかかわらず、そういったジャッジがナショナル・バイアスを理解する能力が、やや欠如していると思います。（この問題がどれほど悪いことかを提示する必要があります。）

まず、ジャッジは「カウントされるとき」、すなわち、採点によってメダルや順位に差異がでる試合でのみ、検出を回避することが簡単にできます。この偏向は、そのジャッジが偏向採点をしていなかった他のすべての試合の平均値に紛れてしまいます。このタイプの偏向が検出可能なのは、ジャッジが極端な採点記録を出した場合のみです。

このメトリックで把握するのが難しい、別のタイプのナショナル・バイアスは、ジャッジが特定の選手を選んで下げ採点し、他の選手は普通に採点する場合です。自国以外のスケーターはすべて平均化されるため、このタイプの偏向はZDifference全体にほとんど影響を与えず、その結果、検出することは非常に困難です。私は将来的には、トップ選手を抱える連盟出身のジャッジが、そのスケーターの直接の競争相手に下げ採点をするかどうかを検討することを考えています。それをお楽しみに。

第三に、スケーターの得点を上げるために、ブロック・ジャッジングが行われているか、または得点の取引きや共謀がある場合、自国のスケーターを採点する際に、他のジャッジとの明らかな違いを減らすことにより、偏向採点の証拠は弱まります。裏返して言えば、選手の得点を下げるための取引きや共謀がある場合、そのスケーターの自国のジャッジの側が、誤って目立った偏向を示すことになります。（それでも、このことによってジャッジに誤ってフラグを立てることについては、あまり心配ないと思っています。なぜなら、特定の連盟から、すべてのスケーターに対する大規模な陰謀がない限り、そのジャッジが出した残りすべての得点と平均した場合、偏向採点の一例は消去されてしまうからです。）ただし、このひと纏りのデータを使用して、ブロック・ジャッジングを少なくとも部分的に見ることができます。これについては、今後の投稿で説明しますので、ご期待ください。

最後に、直接関係ない選手に上げ採点をすることにより、ジャッジが「システムに策略をする」することも可能です。これにより、計算の「Z-other」部分が膨らみます。現在はまだ、これは大きな懸念事項ではないと思いますが、何らかの理由で、これがジャッジの偏向の主な手段となった場合、将来の懸念事項になります。（ジャッジは主に、ランキングが低い選手を上げ採点するためにこの手段を使いますが、評判による偏向に抗うためには、実際には良いことかもしれません）。

まとめ

もっと言いたいことはありますが、私はこの件に永遠に関わったり、長くなりすぎて読みたくない人が出るのは嫌なので、ここで終了し、皆さんからの補足があるならば質問を受けます。結論はとても明確です。フィギュアスケートの採点は、ナショナル・バイアスについて大きな問題を抱えており、多くのジャッジは、自国のスケーターへの好意が露骨です。

このことによって、客観性についてのジャッジの責任について疑問が生じます。典型的なタイプの偏向だけを見てきましたが、最も深刻なタイプの偏向でないのは間違いありません。ある側面での客観性の欠如は、他の側面でも客観性の欠如につながる疑いがあります。

評判による偏向や大きな連盟の偏向など、他の10種類の偏向についてはどうでしょう？ジャッジ達が明らかに一方向に偏っている場合、他方向にも偏っていると疑うことは理にかなっています。

この投稿は、混乱や誤解をくような言葉遣いを改善し、グラフを更新し、いくつかの方法論のポイントを明確にするように、公開以来ずっと編集しています。また、基礎となるジャッジの採点数をskatingscores.comから引用したことを書き忘れていましたが、これは修正しました。また、グラフを作成してくれたplanethanyu.comのVevecoにも感謝します！

2020-04-01

Judging Bias and Figure Skating: Part One – Nationalistic Bias

FS Judging ReviewOctober 17, 2019Uncategorized

As the Senior Grand Prix season is about to begin, I thought now would be an appropriate time to take a systematic look at the judging records of our judges. Despite the fact that figure skating judges play a significant role in determining the career trajectory of young athletes, it is well known that they face little accountability. Only the most egregious cases of biased scoring are ever punished, in spite of the fact that biased scoring is rampant and well-documented. Nor does the International Skating Union (ISU) appear to keep any kind of running record of judges’ performances, so judges avoid all scrutiny as long as they keep their bias to a moderate amount at every competition rather than an extreme amount at a single competition.

In order to improve accountability, then, I’ve decided to compile and analyze the data myself. Over the course of this season I plan on looking at a few different issues: 1. Nationalistic bias 2. “Bloc” judging, judges being systematically biased in favor of skaters from other particular nationalities, typified, for instance, by judges from former USSR countries scoring Russian skaters more favorably. 3. Underscoring of top competitors by rival countries’ judges. For instance, do American judges underscore Yuzuru Hanyu? Japanese judges and Nathan Chen? Russian judges and Rika Kihira? Etc. My hope is that in addition to holding judges accountable, we may also be able to confirm or dispel some suspicions that fans have long held but don’t necessarily have evidence for.

In the first post, we will look at nationalistic bias. We will discuss which judges show statistically significant evidence of nationalistic bias, how we’re able to determine this, as well as address some concerns and limitations.

Part One: Nationalistic bias

First, using data available from skatingscores.com and processing it through a formula to determine how each judge judged each skater relative to the panel, I compiled a spreadsheet of the judging records in terms of relative score of each senior international level judge who has judged an ISU Challenger Series, Grand Prix, or Championship. I also included a couple of other competitions, namely 2019 Challenge Cup and WTT. (Note: I have decided since the publishing of this post that I will remove them in the next update so that the criteria for competition inclusion are more consistent–they were included in this version because of a misapprehension about their status. This will probably affect whether some judges are flagged. Stay tuned.) It may be found here. Note that the file is too big to display on google docs, so you will have to download it.

First, let’s take a look at the top line conclusions. The database currently contains 312 judges, 177 of which I examined for nationalistic bias (the other judges didn’t have an extensive enough judging record), 92 of which showed statistically significant evidence of nationalistic bias. Of those 74 showed strong evidence. On the whole, judges scored their home skaters quite differently than they scored other skaters. For instance, if we plot the average z-scores (this will be explained in detail later) of judges’ scores for their home skaters versus other skaters, we get the following density plot:

As you can see, this shows two very different patterns of judging!

Who are these judges? Well, here they are, divided by federation. Names are included if they meet the standard benchmark for statistical significance (p<0.05), bolded if they meet a stricter benchmark (p<0.01), and bolded and underlined if they meet an even stricter benchmark (p<0.001). In parentheses next to the federation is the number of judges of that federation *tested* (not the total number of judges recorded, some of whom may not be tested due to insufficient data). Federations that do not have any judges that show statistically significant evidence of biased judging are not listed.

Judges with biased judging records

Austria (3)	Adrienn Schadenbauer
Canada (23)	Andre-Marc Allain, Cynthia Benson, Leanne Caron, Reaghan Fawcett, Karen Howard, Leslie Keen, Patty Klein, Nicole Leblanc-Richard, Erica Topolski
China (5)	Dan Fang, Shi Wei, Fan Yang
Czech Republic (7)	Frantisek Baudys, Jana Baudysova
Spain (2)	David Munoz
Finland (7)	Merja Kosonen, Virpi Kunnas-Helminen, Leo Lenkola
France (8)	Ronald Beau, Jezabel Dabouis, Elisabeth Louesdon, Philippe Meriguet, David Molina, Florence Vuylsteker
Great Britain (5)	Christopher Buchanan, Stephen Fernandez, Sarah Hanrahan, Nicholas Russell
Georgia (1)	Salome Chigogidze
Germany (13)	Christian Baumann, Ulla Faig, Uta Limpert, Claudia Stahnke, Elke Treitz, Ekaterina Zabolotnaya
Hungary (2)	Attila Soos, Gyula Szombathelyi
Israel (2)	Anna Kantor, Albert Zaydman
Italy (11)	Matteo Bonfa, Rossella Ceccattini, Raffaella Locatelli, Isabella Micheli, Tiziana Miorini, Miriam Palange, Walter Toigo
Japan (16)	Miwako Ando, Tomie Fukudome, Ritsuko Horiuchi, Akiko Kobayashi, Takeo Kuno, Kaoru Takino, Sakae Yamamoto, Nobuhiko Yoshioka
Kazakhstan (2)	Yuriy Guskov, Nadezhda Paretskaia
South Korea (2)	Sung-hee Koh, Jung Sue Lee
Latvia (1)	Agita Abele
Lithuania (1)	Laimute Krauziene
Mexico (1)	Sasha Martinez
Poland (3)	Malgorzata Grajcar, Malgorzata Sobkow
Russia (14)	Maira Abasova, Julia Andreeva, Sviatoslav Babenko, Igor Dolgushin, Elena Fomina, Maria Gribonosova-Grebneva, Natalia Kitaeva, Olga Kozhemyakina, Lolita Labunskaiya, Igor Obraztsov, Tatiana Sharkina, Alla Shekovtsova
Switzerland (3)	Bettina Meier
Sweden (4)	Inger Andersson, Kristina Houwing
Ukraine (2)	Yury Balkov, Anastassiya Makarova
USA (21)	Samuel Auxier, Richard Dalley, Janis Engel, Kathleen Harmon, Taffy Holliday, Laurie Johnson, Hal Marron, Jennifer Mast, John Millier, Sharon Rogers, Kevin Rosenstein

Let me note that there are many judges who are just on the right side of the borderline of being flagged, who I suspect will be flagged as more data comes in and evidence mounts. There are also many judges who will almost certainly be flagged as soon as they’ve judged enough to meet my minimum threshold for testing. On the other hand, note that due to the sheer number of judges tested, there may be a few judges (chiefly in the non-bolded, maybe one or two bolded, extremely unlikely for bold+italicized) who wind up on this list by pure luck (but the odds that any particular judge on this list is actually unbiased and just unlucky is pretty low). Generally speaking, the odds that I have missed a judge who deserves to be flagged are significantly higher than the odds that I have flagged someone who doesn’t deserve to be flagged.

It is also important to recognize these determinations are made on the basis of averages. Someone who is marked as biased here does over-score their own skaters to a degree that is highly unlikely to occur by random chance, but that does not mean they will over-score every single one of their own skaters at every competition. Individual scores are influenced by a variety of factors, and therefore exhibit a significant amount of noise, and judges are, after all, not clairvoyant, and cannot necessarily predict how the rest of the panel will score. (Conversely, when reviewing judging records, it’s important to remember that just because a single score is out of whack with the panel and looks consistent with nationalistic bias, does not necessarily mean that the judge who issued it is biased in general or that that score was the result of nationalistic bias.)

We can also compare federations. Here we will examine federations with at least ten judges, looking at the degree (the ZDifference) by which their judges have historically been biased. The following bar chart shows what percentage of each federation’s judges have records that fall into 4 different categories of bias: no bias (anything equal to or lower than 0), low bias (0-0.5), medium bias (0.5-1) and high bias (greater than 1). Note that this includes judges who did not meet the minimum threshold of data quantity to be flagged on an individual basis.

As you can see, some federations are all over the place like Canada, with some judges falling into each of the bias categories, whereas for others (chiefly Russia), every judge has a record that favors home country skaters. However, though Russia is the most consistently biased of the federations examined, other federations have a larger percentage of judges who have records displaying a high degree of bias.

Of these larger federations, only Australia appears to have a good claim to be unbiased on any kind of systematic basis. This can be seen more clearly by examining the following box plot, which shows the distribution of ZDifferences of judges from each of these federations in a slightly different manner than the previous graph. (If you’ve never had to read a box plot before, the box is where the middle 50% of judges from each federation lie, and the line in the middle of the box is the median–or middle–judge from that federation. The lines extend out to the full range of each federation’s judges, excluding the judges represented by the dots, who are outliers within their federation.)

Of course, these numbers will change as the season progresses and more competitions are added to the data set. I will try to keep this post updated as time passes. (Everything is up to date as of 17 Oct 2019).

In order to understand how I came to these determinations (and how to read the data), let’s take a look at methodology. As this discussion will involve a lot of statistical concepts that are probably unfamiliar to many people reading this, I’ll try to explain and offer links to statistical concepts that will be invoked here. Rest assured I did not make these metrics up–these are standard statistical tools used to analyze all kinds of data.

Methodology

The basic idea behind determining judge bias was to take each judge, compare their scores for each skater to the scores the rest of the panel gave, quantify how far off the panel judge was, and then put all the numbers together in one place so I could compare how a given judge scored their own skaters in comparison to the rest of the panel, versus how they scored other skaters. I then ran a standard statistical test on the difference in order to figure out the strength of the evidence the judging record provides of bias.

In order to do all this, I started by recording all the scores given by all the judges in all Challenger Series, Grand Prix Series, and ISU Championship competitions since the scoring system changed in 2018. I also included Challenge Cup and World Team Trophy (Note: as I mentioned earlier these will be removed in an update in order to make the criteria for competition inclusion more consistent). This was done by looking up the scores for each judge on skatingscores.com and manually inputting them into a pre-formatted competition spreadsheet. You can find the competition spreadsheets here.

If you open these spreadsheets, you’ll notice they look something like this:

		Odhran Allen	Doug Williams	Maria Fortescue	Veronique Verrue	Andreas Waldeck	Lorna Schroder	Miwako Ando	Mean
		IRL	USA	ISL	FRA	GER	CAN	JPN
Yuzuru Hanyu	JPN	164.01	157.34	172.44	161.55	166.7	167.4	170.74	165.74

(This example from the 2018 ACI men’s free skate is completely random, obviously.)

As you can see, it lists judges, judge nationality codes, skaters, the skater’s nationality code, and the scores given by each judge (these are pulled from skatingscores.com, if you want to verify the numbers–I’m very happy for the existence of this website, as it made this whole process a lot quicker as I didn’t have to run the calculations myself!), as well as the average of all the judges’ scores.

From this data, I first determined how much higher or lower each judge scored each skater compared to the rest of the panel by subtracting the mean of all the scores from the individual judge’s score. This produced what I called the score deviation. Using our friend Yuzu as an example here, this produces:

		Odhran Allen	Doug Williams	Maria Fortescue	Veronique Verrue	Andreas Waldeck	Lorna Schroder	Miwako Ando
		IRL	USA	ISL	FRA	GER	CAN	JPN
Yuzuru Hanyu	JPN	-1.73	-8.4	6.7	-4.19	0.96	1.66	5

So Odhran Allen scored Yuzu 1.73 points below the other judges, Doug Williams 8.4 points below, etc. This is what’s shown in the second block on the competition sheets. Now, unfortunately, we can’t just leave it at that, because we want to compare how judges score skaters across competitions, segments, and disciplines, in order to get the largest, most robust data sets. -8.4, while already a lot even in the men’s free, would be absolutely massive in the short, and also bigger in ladies or pairs than in men’s. Therefore, in order to make these score deviations more comparable, I had to standardize them into z-scores. Note that this is an extremely common method of standardizing data.

Here’s how it’s calculated: first, I have to determine something called the standard deviation of the judges’ scores. Standard deviation is another one of those common statistical measures, and it quantifies how spread out a set of numbers is from the average of those numbers. So if judges are all over the place on someone’s scores, then the standard deviation will be relatively high, whereas if judges are all more or less in agreement, the standard deviation will be low. In this case, the standard deviation of Yuzu’s scores was a fairly typical (in men’s) 4.85.

To calculate the z-scores, we just need to divide each of our score deviations by the standard deviation of Yuzu’s scores. One way to think about the z-score, then, is that it tells you how many standard deviations a judge scored a skater above or below average. Here is what happens to Yuzu’s score deviations once they’re converted into z-scores.

		Odhran Allen	Doug Williams	Maria Fortescue	Veronique Verrue	Andreas Waldeck	Lorna Schroder	Miwako Ando
		IRL	USA	ISL	FRA	GER	CAN	JPN
Yuzuru Hanyu	JPN	-0.36	-1.73	1.38	-0.86	0.2	0.34	1.03

Z-scores typically range from -2 to 2, though occasionally you’ll see numbers outside that if a judge *really* disagrees with the other judges (this occur at roughly a 5% frequency). Underscoring (ie. scoring below the other judges) turns into a negative z-score, while overscoring (ie. scoring above the other judges) turns into a positive z-score.

Using z-scores actually build in a measure of leniency for the judges. If there’s a lot of disagreement within the panel about a skate, then the z-score will be less extreme than the raw score difference, so a big difference with the average will “count” less. On the other hand, it does mean that if there’s a lot of agreement among panelists, someone who is a lone outlier may have a more extreme z-score in comparison to the raw score difference. But overall, the z-score makes it a bit easier for biased judges to hide. Oh well, it’s not like they hide very well in the first place.

Once these z-scores are computed for each judge, competition, and segment, all of the z-scores associated with a given judge are collected together into one sheet, the individual judge’s sheet. You can find these in the big judges database. If you click on any judge’s name in the sheet labeled “Judges”, you’ll be taken to the individual sheet for the judge you clicked, where you can see the collected z-scores for all the competitions they’ve judged.

On the left you’ll see a bunch of summary statistics. I’ll explain those in a second. On the right you’ll see z-scores. As you can see, they are labeled by skater and nationality, and at the top there’s a code which tells you which competition and segment is being shown in a given set of columns. This is composed of [Year][Competition Code] [Segment Code]. The key for competition codes may be found in the “Checklist” portion of the Judges Sheet, which also lists which competitions are included in the database.

Using a formula, all of this data is split into two groups–z-scores for home country skaters and z-scores for other skaters. The z-scores for home country skaters are averaged, producing Z-home on the left. Same thing for the z-scores for other skaters, producing Z-other. The difference between the two, which is what we’re ultimately interested in, is then calculated as ZDifference. (You’ll also see these metrics in the overall judge summary).

You can think of the ZDifference as representing the degree of bias that a judge has shown to home country skaters. In terms of raw score as a rule of thumb, a ZDifference of 1 represents about 7-8 points in men’s, 6ish in ladies and pairs, and 6-7 in ice dance over the course of a competition. In other words, a judge who has a ZDifference of 1 will give, on average, 7-8 bonus points in men’s, etc., versus what they would typically give a skater who is not from their home country.

Of course, the degree of bias shown is not the only thing that matters when it comes to assessing a judge’s level of bias. If a judge shows a ZDifference of 1 but has only judged their home country skater a couple of times, it’s possible that that ZDifference is simply due to random chance or other factors. On the other hand, if the ZDifference persists across many competitions, we can be much more confident that the judge is biased.

This is where the metric p comes in. p is another standard statistical measurement, and in the context of our data it represents the chance out of 1 that an unbiased judge, ie. one that scores home country skaters no differently than other skaters, could arrive at a record that evidences equal or greater bias than the actual judge’s record purely by accident. So in other words, the lower p is, the higher the chance that there is some kind of systematic difference between how a judge scores home country skaters and other country skaters.

By convention, a p value below 0.05 is considered statistically significant, and that is the standard I will be using to flag judges, though in many cases we’ll see that p will be far below that threshold. For instance, in the case of Russian judge Olga Kozhemyakina, p=0.000000000000003 (note that that’s a 0.0000000000003% chance an unbiased judge would produce a record equal to or worse than hers). It’s better not to see statistical significance as a binary thing, however. Instead, you should become more and more suspicious of a judge as p drops. Notice that many judges whose records were not flagged nonetheless have fairly low p values–I suspect that many of these judges’ records will start getting flagged as more scores start coming in.

By considering ZDifference and p jointly, we can make a full assessment of a judge’s judging record. ZDifference tells you the severity of the historical bias, whereas p tells you the probability it came about by chance.

I used a standard statistical test for the difference between two means, the Welch’s t-test, in order to calculate p. I used the one tailed version of the test, because we’re only looking for bias in one direction. Welch’s rather than Student’s was used because I noticed that judges with extensive judging records tended to have different variances for home country skater scores versus other scores. (If you didn’t understand this paragraph, that’s okay–unfortunately, it requires a lot more effort to explain how calculating p works in depth, so I will have to pass on doing that. If you would like to learn, I would recommend you take an introductory statistics class.)

Discussion

First, the most obvious and basic question: what does bias mean? Here, I’ve been using bias to mean a demonstrable, mathematical difference between how a judge scores their own skaters and other skaters. By claiming a judge is “biased,” I don’t mean to impute anything in particular about their psychology, nor am I making any claim about the origin of the bias. It may be conscious, or it may be unconscious. It may be a deliberate attempt to manipulate the results, or simply the same kind of lack of objectivity fans often display concerning the skating of their favorite skaters. Personally, I am not overly concerned about the causes of the bias, only that it exists–after all, let us remember that these judges, through their scoring, determine young athletes’ futures. I don’t know about you, but I don’t want the future of these young, incredibly hard-working people to be determined by a group of people who are unable to be objective, whether that lack of objectivity is the result of corrupt intent or simply clouded judgment.

However, let me address some specific explanations for bias which I do not believe are true or at least have problems, as well as other attempts to defend the judges from criticism.

1. The bias is just due to cultural preferences. People tend to look more favorably upon programs they are culturally familiar with and score them higher, and obviously a judge and a skater from the same country are more able to culturally understand each other.

First, skaters from the same country often have very different skating styles and skate very different types of programs, so it strains credulity to believe that there is something quintessentially “Russian” or “Canadian” or whatever about all skaters who skate under the same flag. Sasha Trusova’s programs look completely different from Alina Zagitova’s which are completely different from Alena Kostornaia’s, and they’re even coached and choreographed by the same people!

Second, if this were true, we would expect to see judges from culturally similar countries scoring each others’ skaters higher. For example, Canada and the US are two extremely culturally similar countries, so Canadian judges should overscore US skaters and vice versa. Fortunately, the sheet is built such that it’s easy to test this proposition (just change the country code inside each judge’s individual page to see how they score specific other countries’ skaters) and in fact, we do not see this. The vast majority of Canadian judges score US skaters like a unbiased (or even anti-biased, ie. biased in the other direction) judge would, and so, too, the other way around. (You can confirm this yourself by downloading this sheet here, which has each judge’s nationality switch to that of a large, at least somewhat culturally similar or at least geographically close figure skating country. Or you can switch around judges’ nationalities yourself on the original sheet, and it will recalculate all the stats for each judge (you have to do this inside each judge’s individual sheet, however, not on the summary sheet).

There are a few exceptions to this (former USSR countries’ judges tend to score Russian judges higher, although the level of bias is not quite as severe as it is for their own skaters. Also, I believe South Korean judges score North Korean skaters higher, but whether those two countries have similar cultures seems quite debatable. We will look at this in more detail in a future post), but I believe there is a better explanation for the exceptions. In general, culturally similar countries do *not* score each others’ skaters higher.

2. It’s just human nature to be biased. We should realize that judges are humans too and not robots.

Judges are not all the same. They do not all show evidence of bias. For instance, Glenn Fortin (CAN), Katharina Heusinger (GER), Andreas Waldeck (GER), Ayumi Kozuka (JPN), Shizuko Ugaki (JPN), Linda Leaver (USA) all have reasonably substantial judging records that do not evince any substantial evidence of bias. This clearly indicates that it is possible for judges to be unbiased, at least when it comes to nationality-related bias. Evidently, not all human have this particular human-nature related flaw. Even among the judges who are biased, there are considerable variations in the degree of bias shown. The worst offenders, for example Salome Chigogidze (GEO), Nicholas Russell (GBR), and Elena Fomina (RUS), have ZDifferences in the range of 1.5-2, whereas the lowest statistically significant differences are in the range of 0.5. This shows that it is certainly possible to reduce the level of bias of the judges overall by getting rid of the worst offenders and replacing them with less biased judges, even if some low level of bias (say, less than 0.5) is difficult to get rid of and may not be practically significant enough to be worth dealing with.

3. Your metric looks at judges’ bias by comparing them to the mean of the judging panel. Doesn’t that assume that the mean of the judging panel is right? But sometimes it’s the outlier judge that is right, and the other judges that are wrong.

It may be true that the outlier judge is indeed “right”–I avoided making any assessment of what a skater “should objectively” have scored because those types of assessments lead to unproductive fan wars and involve a level of personal judgment that I did not want to introduce to this study. However, let me note that outlier scores only “count against” judges if they align with expected patterns of nationalistic bias. If a Japanese judge scores a Filipino skater way above average because only that judge was being objective and the other judges all underscored him due to some other form of bias (reputation, small country, etc.), then that will actually count very mildly in favor of the Japanese judge. Only if a Japanese judge scores a Japanese skater way above average does it count “against” that judge. But in that case, there are still at least 3 other data points to consider (I only start calculating p if a judge has scored her own skaters at least 4 times), and if the judge only shows a pattern of “correcting” the scores for her own skaters, one has to begin wondering whether they are truly being objective. Again, judge is not labeled “biased” for having scores that deviate from the mean, a judge is labeled “biased” if there is a difference between how they score their own skaters and other skaters. A judge who scores both groups 2 standard deviations above the mean would not trigger the flagging formula, despite having scores that are way out of whack with the other judges. It is only the difference between the judges own scores for her skaters and other skaters that matters.

The only situation in which I think this may be a substantial concern is for judges from a small federation who have only judged one or two unique home country skaters. In that case, it’s possible that a personal (rather than nationalistic) preference, or a genuine belief that a particular skater (but less a large group of skaters, as that would affect Z-other as well, and thereby decrease ZDifference) is underscored, which just so happens to coincide with a national flag, gets “wrongly” flagged as nationalistic bias. This being the case, we may wish to be a little bit more lenient on small-fed judges. However, this defense hardly applies to the judges of large and powerful federations like Russia and the United States, who will judge many different skaters from their own country through their judging career, and whose skaters cannot credibly claim to be underscored because of their nationality.

Limitations/Other considerations

Despite the fact that so many judges show evidence of nationalistic bias by the methods used here, I actually think that they are somewhat limited in their ability to catch nationalistic bias. (Which ought to indicate how bad the problem is.)

First, a judge can quite easily avoid detection by only being biased “when it counts” ie. in only a selected number of competitions, when there are medals or spots at stake and a tweak in judging can make the difference. Because this bias will be washed out in the average with all of the other competitions where the judge was not being biased, this type of bias, if detectable at all using these methods, will only be so after a judge has built up an extremely substantial judging record.

Another related type of nationalistic bias this metric is not good at catching is when judges selectively underscore only direct competitors, but score everyone else normally. Because all non-home-country skaters are averaged together, this type of bias has little impact on the overall ZDifference, and consequently it is very difficult to detect it using the method here. I hope to address this in a future segment which will examine whether judges from federations with top competitors underscore the direct competitors of their skaters. Stay tuned for that.

Thirdly, if there is bloc judging going on, or any other score-trading or collusion scheme to increase a skater’s score, it will function to weaken the evidence for biased judging by reducing a biased judge’s apparent difference with the other judges on the panel when scoring their own skaters. On the flip side, if there is a score-trading or collusion scheme to lower a competitor skater’s score, that may wrongly introduce apparent bias on the part of the judge from the home country of that skater. (Nonetheless, I don’t think this creates a major concern about wrongly flagging judges, because unless there is some grand conspiracy against all of the skaters from a certain federation, one instance of apparently biased judging will be washed out when averaged with the rest of the scores that judge has given.) However, we can use this same dataset to take at least a partial look at bloc judging, and we will do so in a forthcoming post, so stay tuned for that too.

Finally, it is also possible for judges to “game the system” by overscoring non-direct competitor skaters, thereby inflating the “Z-other” portion of the calculation. I don’t think this is a major concern now, but if somehow this were adopted as the primary means of track judges’ bias, then it would be a concern in the future. (Although they would chiefly do this by overscoring lower ranking competitors, which might actually be a good thing, since it would combat reputation bias).

Conclusions

There is, as always, more to say than I have said, but as I don’t want to spend literally forever on this or produce something so long no one wants to read it, I will end it here and leave it to others to raise questions if there is a gap you would like me to fill. The overall conclusion is pretty clear: figure skating judging has a massive problem with nationalistic bias, and many judges are extremely blatant in their favoritism for their own countries’ skaters.

This also raises questions about judges’ commitment to objectivity in general. Though we have only looked at a very specific type of bias, and arguably not even the most significant one (just the easiest one to tackle using statistical methods), one might suspect that lack of objectivity in one respect bleeds in to lack of objectivity in others. What about other forms of bias that are also often discussed, like reputation bias and big fed bias? If judges are so demonstrably biased in one way, it seems reasonable to suspect that they are also biased in others.

This post has been edited since it was published to improve verbiage, update graphs, and clarify some points of methodology that were initially confusing or misleading. Also, I forgot to credit skatingscores.com for the underlying judge score numbers–this is now fixed. Also, thank you Veveco from planethanyu.com for doing the graphs!

20 thoughts on “Judging Bias and Figure Skating: Part One – Nationalistic Bias”

Patricia Orzechowski

October 17, 2019 at 2:23 pm
your bias judges basically leaves all judges bias. So How biased are you when you came up with this methodology? Do you favor one skater over others, is your personal bias getting in the way of fair nonbias judging? This reeks of bias in of itself.

You have not explained why you personally are involved in this bias judging or the judges?
Are you a figure skater or former figure skater?
Are you a coach of a skater or yourself that has not achieved the PC’s scores or calls of ur’s?
1. the judges mark on how well closely fit the scoring sheet.
  Are you a skater that gets calls that you think should not get
  Are you a coach of a skater that gets calls you think should not get?
  
  are you a judge that is being told this of which fans should be aware.
  
  WHat is in it for you.
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 17, 2019 at 3:12 pm
    
    Every judge is added, but some were not tested because they didn’t judge enough for the test to be reliable. This was, in fact, to protect judges from being victims of false positives due to a scanty judging record.
    
    Again, please review the methodology. You’ll find that at every step, all judges were treated equally according to objective criteria. I could tell you my favorite skaters (Yuzuru Hanyu) and my own nationality (USA), but it’s neither here nor there, as someone who had a different favorite and a different nationality would get the same results, and in fact I welcome anyone who wishes to follow the methodology here or who wishes to do a similar study to do so and inform me of the results. Also notice that neither Japan nor the USA were left out of my examination, and plenty of judges from both countries were called out. Many other skaters that I like, for instance Alena Kostornaia, Mikhail Kolyada, Evgenia Medvedeva, and Alexandra Trusova, are Russian, and I certainly did not spare the Russian judges. Nothing is in this for me but the integrity of the sport–it certainly took many, many hours I could have devoted to doing something that might actually earn me money! All I want out of this is to see all our young, hardworking athletes judged fairly, regardless of their nationality.
    
    This project was not about tech calls, and the issue of accurate, inaccurate, or biased tech calling is outside the purview of this project, except insofar as an official serves sometimes as a judge and sometimes as a tech caller, their judging record may shed light on whether we can expect them to be a fair tech caller. Of course, I have personal opinions about how scoring should be improved, but at no point did those judgments enter in to the calculation.
    
    At this point, it is clear that you are not operating in good faith, as you have not put in any effort to actually understand the methodology of this project. If you would like clarification on any part of the write up, I am happy to provide it, but I will no longer reply to someone who makes baseless insinuations that show no understanding of what I have written.
    
    Liked by 1 person
    
    Reply
2. Florica
  
  October 17, 2019 at 4:01 pm
  
  This is really interesting. I found that US judges are usually guilty of the underscoring a direct treats of their skater(s) (Hello, Sharon Rogers :))
  If you are really into statistics – would it be possible to make comparison of PCS for warhorse music against the none often/modern one?
  Thank you!
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 17, 2019 at 4:50 pm
    
    I will definitely be looking at federations underscoring direct threats to their skaters in future posts. This is pretty common, and unfortunately isn’t well accounted for by my methodology here, since the scores for all non-home-country skaters get lumped together.
    
    The music comparison would be a lot harder, and wouldn’t be possible with this data set, because this data set relies on comparing judges to the rest of the panel. If there’s some kind of effect that would affect all panelists, it’s not something I’ll be able to catch using what I did here. In order to design a study to look at the effect of music, I think you’d have to compare PCS scores of the same skater and how they change season by season in relation to the type of music they use (obviously you’d look at many skaters but you’d compare them with themselves). You’d have to control for their PCS changing for other reasons, like improved reputation/skating skills/etc. though, which I imagine is quite difficult. This would require a lot of work and isn’t very related to the bias issues I’m trying to tackle here, so it’s not something that is on my radar, but I invite anyone who wants to to do such a study. The nice thing about figure skating is that there is a lot of data available, and it’s all public, so in theory you should be able to come up with ways to answer all kinds of questions about how the sport is scored.
    
    Liked by 1 person
    
    Reply
3. Indigo
  
  October 17, 2019 at 7:48 pm
  
  “I will definitely be looking at federations underscoring direct threats to their skaters in future posts.”
  
  Are you going to use examples other than Yuzuru Hanyu?
  
  Also, in terms of outlier bias, at ACI, while the US judge was indeed lower, he wasn’t exactly super generous to other skaters in that competition, including Jason Brown.
  
  National bias isn’t just about deviation from a mean, but how it’s applied across the field and your analysis, while interesting, fails to exhibit how these judges applied their scoring towards the other 2018 ACI skaters and resultant SDs and z-scores.
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 17, 2019 at 8:15 pm
    
    Please read more carefully–I’m a bit baffled at how you arrived at the conclusion that I only analyzed Yuzuru Hanyu’s ACI 2018 score. Every single score was analyzed from the beginning of the 2018-2019 season to date. Yuzuru’s score at ACI was only a demonstration to explain how the math works.
    
    Liked by 1 person
    
    Reply
4. KJM
  
  October 17, 2019 at 8:29 pm
  
  Thank you for this and all of the work that went into it. It needs to be said. Always has. I knew when the new system came in that they could still manipulate their scores and therefore the results. You’re doing great work.
  
  Like
  
  Reply
5. Kim
  
  October 17, 2019 at 9:20 pm
  
  I don’t have an issue with this kind of research and it could be a good tool for judges training. However I do have a couple of issues with the outline and the way it is written.
  
  It starts off basically accusing judges of bias in a negative context. It would be better to start with explaining what bias is rather than launch into accusations. The problem for me the whole premises is not completing unbiased when it comes to methodologies and what is being sought in evaluating skating judging.
  
  It also names and shames which I don’t think is helpful and could put people on the defensive. I don’t have an issue with regards countries being analysed but putting people’s names in there is opening yourself up to legal trouble and could actually go against the ISU code of conduct and member protection.
  
  I do think the intention is good and it could be used as part of judges training however it doesn’t appear entirely objective in what it is trying to achieve.
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 18, 2019 at 4:25 pm
    
    I’m not a member of the ISU, so I am not bound by its code of conduct. And I am skeptical that any sort of lawsuit would be anything but a waste of time and money on the part of the person attempting to sue.
    
    I do have an objective here, which is to hold judges accountable for their judging records. I freely admit to that objective. This objective is not achievable without naming names, so that is what I have done. If the ISU would like to introduce some kind of accountability mechanism that obviates the need for this relatively public method of achieving accountability, I would of course welcome that, as I understand that it may be unpleasant for these judges to be called out. However, as I have made all the necessary qualifications and elaborations in the body of the text (that “bias” here means a statistically significant difference in how a judge scores home country and other country skaters, that it’s possible that, given the number of judges who are tested there may turn out to be a few false positives, that the confidence of should be modulated according to how low the p value is, etc.), I do not feel squeamish about doing so. I will be updating the list as more data comes in, so the fact that a judge is on it right now does not mean they will necessarily be on it forever. If the evidence starts to weaken, I will remove them.
    
    As for method of presentation, I wrote this article with the aim of maximizing its readability to a lay audience and its ability to hold interest. While it would be nice to start off with the methodology, I’m afraid that if I did that no one would read it as it would be excessively confusing and a bit boring. The article makes abundantly clear what “bias” means, and after all “bias” does not in ordinary language mean anything much different from the way I have used it. It just means lacking in objectivity, partiality.
    
    Like
    
    Reply
    
    Kim
    
    October 18, 2019 at 9:29 pm
    
    I totally disagree. If you want to be taken seriously, then you do need to present the “research” a bit more objectively and professionally and that comes from the first impression. It seems to be preaching to the converted (ie those that already think judges are corrupt) rather than trying to make a convincing argument to those that might not see it that way. Just saying.
    
    Like
6. Kim
  
  October 17, 2019 at 10:02 pm
  
  Also I am wondering why you haven’t revealed your name and credentials. You have suggested that you want to bring accountability to the sport. Remaining anonymous kind of reduces your level of accountability.
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 18, 2019 at 4:40 pm
    
    If I were at all untransparent about my methodology, then you may have a credible complaint. However, I have fully explained every single step I took to get the numbers I presented in this post. At no point have I hidden anything about how I got these numbers or the conclusions, and every single document I used is linked. My identity is irrelevant–anyone who follows my methodology as I have outlined it will get the same numbers, and any attempt to attack my results on the basis of my identity is ad hominem and an attempt to avoid engaging the actual evidence. If you think that a particular conclusion is baseless, please indicate what particular problem you have with either the methodology or the argument. I certainly do not expect anyone to accept my arguments on authority–this is why I have painstakingly presented every step, with accompanying reasoning, on the way from the scores to the conclusions.
    
    Like
    
    Reply
    
    Kim
    
    October 18, 2019 at 9:23 pm
    
    I disagree. I know you will defend it but people do want to know what your involvement in the sport is and your qualifications. Unfortunately my impression is that you do come across as a person with a bit of axe to grind and are not purely objective in what you are trying to achieve which is indicative of your first paragraph. I would assume that you were one of the people that hated anonymous judging. So is there a reason why you can’t share the same accountability?
    
    Like
7. Kim
  
  October 18, 2019 at 1:53 am
  
  Actually I am also wondering why you are just basing this on total score? To get an accurate reflection wouldn’t it be more effective to look at individual GOEs and Component Score against the average GOE or Component Score for each skater? The sum of a score doesn’t necessarily tell the story.
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 18, 2019 at 4:00 pm
    
    It’s not based on total score, but the difference between each judge and the average total score, which is a product of differences in PCS and GOE scoring. This method automatically accounts for the different amount of GOE available per element, versus adding up raw GOEs, and automatically weighs the impact of PCS vs GOE in accordance with how they influence the final score.
    
    Like
    
    Reply
8. FS Peer Review
  
  October 18, 2019 at 10:13 am
  
  Who is the author of this study? Is there a way to contact you?
  
  Like
  
  Reply
  1. FS Judging Review
    
    October 18, 2019 at 3:55 pm
    
    You can message me through this site. I might not respond immediately, but I’ll try to check once a day.
    
    Like
    
    Reply
9. VW
  
  October 18, 2019 at 2:12 pm
  
  The International Skating Union has set up a special commission to evaluate judges (Article 23 ISU Constitution).. This commission in turn consists of judges and evaluates their performance as subjectively as the judges in the competition. What is lacking is a scientifically sound analysis of the scores with comprehensible static methods. Your method seems to be a promising approach. Submit your work to the USFSA for review and ask them to forward it to the ISU. It may be that the ISU is interested in a scientifically based method to analyze and evaluate the judges’ scores.