Wednesday, January 21, 2009

How Much Do Sprint Qualifiers Matter?

The Andy Newell article piqued my interest -- just how much does sprint qualifying have to do with the final results?

Anyone who qualifies 31st will tell you, "a hell of a lot," but that's not really what I'm talking about. Obviously, you have to ski in the top 30. But Jens Arne Svartedal won Stockholm last year with bib 29, and Otepaa before that with bib 30 -- so qualifying high clearly isn't requirement for success.

In lieu of further hand-wringing, let's look at some real data. The data set is 12 World Cup sprint races from 2007-2009, so we have 355 data points (apparently 5 people got DQ'ed during that time -- Ivanov in Whistler 2009, Rotchev in Stockholm 2008, Kershaw in Kuusaamo 2008, Kruikov in Drammen 2008, and someone else I can't seem to find).

The coefficient of correlation between qualifying position and final position is 0.466. Since 1 would be "perfect correlation" and zero would be "no correlation at all," we're actually closer to qualifying position meaning nothing thanat we are to it meaning everything. Combine that with the fact that the lower places (12-30) are partially ordered by qualifying position and a 0.466 is even less impressive.

Here's the master graph and table -- it's a lot to look at. You can come back later.
In the table, rows are qualifying position and columns are finish position. So we can see from the top left that guys who qualified in the top 5 finished in the top 5 26 times (out of 60 attempts). This is actually the strongest correlation on the entire chart -- 43% of the time, a top-five qualification is converted into a top five finish.

At the other corner we see that the bottom 5 qualifiers finished in the bottom five 16 times, only 29% of the time. So it's more likely that you'll qualify well and finish well than that you'll qualify poorly and finish poorly.

Let's look at where the top five and bottom five come from:

Note that 71.6% of the top five finishers qualified at least 10th, and nearly 87% qualified in the top 15th. Our dataset only has one winner from outside 15th place (Svartedal in Stockholm) and only two in 2nd place (Hetland in Kuusaamo, Kjoelstad in Davos), and two in 3rd place (Pasini in Dusseldorf, Larsson in Kuusamo). It's safe to the say that moving onto the podium from outside the top 15 is very hard.

On the flip side, though, the bottom five distribution is more spread out. The 26th-30th qualifiers are the most prevalent, but after than everyone from 5th-25th is consistently represented. Remember that 26th-30th places are the five people who lost quarterfinals -- in 12 races, someone who qualified 6th-10th straight-up lost their quarterfinal 9 times.

In other words, you can expect someone who should advance to crash out or otherwise botch things completely about 75% of the time. I wonder why this is -- the 6 thru 10 qualifiers are the guys who should theoretically get the second automatically advancing spot from quarters, so perhaps they feel the pressure of being a guy who SHOULD advance, withouth actually having the confidence of knowing they're the fastest in the heat? The 6th-10th finishers lose the heat more often than 21st-25th, probably because they are more likely to crash or blow up trying to hold onto the front than to just try to pick up a few places like the later qualifiers.

The one group that seems to be immune from losing quarterfinals are the top five qualifiers -- in 12 races, this only happened once. One guess who it was.

For the sake of space we'll look at the 6th through 25th spots as line graphs. First up are the distributions for the 6-10 and 21-25 finish spots:
The interesting thing here is that the 6th-10th places are more likely to be occupied by someone who qualified in the top 5 than in the 6-10th slots! This goes along nicely with the "6-10 seeds crash out a lot" theory. And we stay sane by noting the the 21st-25th places are most often occupied by someone who qualified in them.

The least likely occurence here is a top-5 seed falling to 21st-25th (the lowest red point), which only happened twice. This is finishing second-to-last in your quarterfinal after having the top seed, an ignominy reserved for Dusan Kozisek in Canmore and Tore Asle Gjerdalen in Lahti. In Kozisek's defense, he also has one of the four "qualify 21-25, finish 6-10" data points where he qualified 22nd and finished 6th at Lahti.

Finally, the midsection -- 11th through 20th:

The 16-20 graph spikes at 16-20, letting us know that not all is crazy in the world, although this is the lowest spot we seen a decent number of top five qualifiers finishing at-- 7 out of 60 top seeds finished fourth in their quarterfinal.

The 11-15 graph is the flattest of them all -- there are as many top 5 seeds ending up here as bottom five! These are the guys who lost the small final, or finished third in their quarterfinal. Finishing here is a good goal for the 26th-30th qualifiers, who got this high 9 times -- but only broke into 6-10th four times and 1st-5th three times.

There's not too much concrete information to take away here -- the bottom line seems to be, qualifying doesn't mean too much. We can say that qualifying in the top five usually leads to a top 10 finish; qualifying outside the top 15 almost never ends on the podium. But a bad qualifying spot doesn't doom you to a bad finish -- at least on the men's side.

We'll run these numbers again, for women, in the near future -- I'm expected a much higher coefficient of correlation and more absolutes, like "no one has ever made the final with a bib higher than X." Stay tuned.


Christopher Tassava said...

Faaaascinating, especially the bit about the racers with the 6-10 bibs. I wonder, given the depth of men's sprinting - yeah, the Norwegians and Swedes win a lot, but it *seems* that, at least compared to women's sprinting, many different racers, from many different teams, have captured podium spots in this '07-'09 period - if the racers in those spots tend to be the same racers, at least over periods of time - say, half a season or so. Is there some sort of cohort effect, i.e., that these are all racers "on the verge," and presumably moving up (getting faster) or down (getting slower), and thus tend to have similar (good but not winning) results?

And what about factoring time into the analysis? Surely, some qualifications had big time spreads from #1 to #30, others had very small ones. Do outcomes change if you factor in this time spread, given that a tight field presumably means that racers are more evenly matched? A quick check of the '08 Stockholm and Otepaa races says no: Svartedal was 6.27s behind #1 Boerre Naess at Stockholm and 4.70s behind #1 Petter Mylback (who?) at Otepaa, yet won the events. Is this a lot of time, or a little? I could check, but numbers make my head hurt right now.

I can't wait to see the analysis of the women's sprints.

Ari said...

Garrott Kuzzy was also DQ'ed in the sprint at Whistler 2009 I believe...

Colin R said...

I think technically Kuzzy was only DQ'ed from the heat, not DQ'ed from the event -- he still shows up on the results in 29th. Meanwhile Ivanov was DQ'ed entirely, as he's not on the results anywhere (which is why Kuzzy is 29th and not 30th).