Top-300 Recruits and the NFL Draft

Submitted by Blue@LSU on April 21st, 2023 at 9:58 PM

What is this diary all about? Why are we here? What truly is the meaning of life?

Well, I may have gotten a bit carried away with myself. But if you want answers to the last two questions, I’d recommend you look to the great philosophers…that is, if you can pull them away from their little game of football (I’m definitely on the side of Germany in this one).

Anyway, some people asked some good questions in my last diary (right below this one, or here) about whether there might be some bias in how recruits are evaluated. Specifically:

i'd be curious about what percentage from a given state actually panned out to their rating, admittedly a tough and subjective data set.  and by that I mean take for instance, say, texas with it's 3 million 5*'s.  are they panning out at the same percentage as other states 5*'s or is there some over-rating bias in their rankings? - XM - Mt 1822

It would be interesting to see a graphic for NFL picks, by home state of the player, to see if there is a geographical bias in 247’s ranking of recruits - in other words, are Texas, Florida, Georgia, etc. as dominant in producing NFL picks as they are in producing top 100/200 recruits? - First And Shuttlesworth

Do the Southern states really have that big of a talent advantage or is there a bias in the rankings?  Seems like the number of guys going pro would be fairly evenly distributed across geography unless there really is an advantage to playing in states like Texas where they play football year round (or at least train for it.) - WestQuad

So I’m going to try to answer these questions with statistics. Specifically, I’m going to analyze whether recruits from some of the hotbed Southern recruiting grounds are drafted at the same rate as those from other states.

Anyone that’s not interested in the details of the data can skip down to the figures. (Sorry, I can’t apply one of those nifty “jump” features from the front page.)

THE DATA, THE DATA, THE DATA

Luckily, the good folks at CFB Data have already done much of the legwork for this dataset. After some fairly extensive data cleanup (identifying mismatched player id #s, removing repeat observations, etc.) I was able to come up with a dataset of top-300 recruits from 2005 to 2017 in the 247 Sports Composite, and all top-300 recruits from 2006 to 2017 in the ESPN rankings (my own addition). All top-300 recruits in 247 and ESPN were then matched with draft picks from 2008 to 2022, also provided by CFB Data.

The start dates are determined by data availability. The data on 247 recruits gets pretty spotty before 2005, and ESPN doesn't have data before 2006 (ESPN 150 from 2006-2012, then ESPN 300 after). To make sure I am making direct comparisons, I only look at the years where I have data from both 247 and ESPN, that is, from 2006 to 2017.

The end date of the 2017 recruiting class was chosen for more practical reasons. It was important to choose an end date where the recruits have (mostly) run out of eligibility. Otherwise, we don't know whether a top-300 recruit hasn’t been drafted, or simply hasn’t been drafted yet. By my count, the members of the 2017 recruiting class, or most of them anyway, should have run out of eligibility by the time the last NFL draft came around in 2022. Except for THIS GUY that is.

The data are attached at the bottom of this diary for anyone interested. As always, if you find any mistakes, please let me know so I can correct them for future use.

All tables and figures can also be accessed HERE.

WHERE DO THE TOP-300 RECRUITS COME FROM?

I first mapped the data to show where the top-300 recruits come from. The first two maps show the distribution of 247 and ESPN top-300 by the recruit’s home county.

 

Keep in mind that ESPN only recorded the top-150 for the first 7 years of these data, so the numbers are not directly comparable. In any case, the coverage is pretty similar. Both recruiting services have a heavy emphasis on Southern athletes. The main difference is that 247 seems to do a better job of identifying recruits in the Northeast, across the Midwest (Missouri and Kansas), and into the Mountain West (Colorado and Utah).

The biggest difference between two services is in the geographic distribution of 5-star recruits, which we can see in the next two maps by the recruit's home state.

 

Only one state above the Mason-Dixon Line (New Jersey) has 6 or more 5* recruits according to ESPN, whereas there are 4 such states according to 247. Alternatively, the ESPN rankings show 7 Southern states that have at least 6 5* recruits and, altogether, these states account for almost 2/3 (63%) of all 5-star recruits. While 247 has a similar concentration of 5-star recruits in the South, it doesn’t seem to be as dramatic. These same 7 states account for 53% of the 247 5-star recruits.

TOP-300 RECRUITS & THE NFL DRAFT: COMPARING 247 AND ESPN

Before looking at the state-level results, I thought it might be interesting to make a bird’s-eye view comparison of 247 and ESPN in terms of predicting NFL draft picks. I break these down into five-stars, and the top-100, 200, and 300 recruits from each service.

ESPN has a slight advantage of predicting draft success among 5-stars. But also note that ESPN gives out about half the number of 5-stars per year (about 16) than 247 does (32). In this light, I would say that 247’s record is more impressive.

Where 247 really pulls away from ESPN in predicting draft success is among the top-100. Roughly 41% of top-100 recruits heard their names called during the draft compared to around 37% for ESPN. As we could guess, the accuracy drops off the further down the rankings we go as roughly 27% of the top-300 from both services are drafted. Is that good? Less than desirable? I’ll let you judge.

RESULTS!

So what do the results show? Is the intuition of the above posters correct that there is an inherent bias toward some, especially southern, states? How do highly-rated recruits from the recruiting hotbeds perform compared to the recruits from other states?

The first set of 4 graphs below show the predicted probability (from probit model estimations) that a recruit from a particular state will be drafted. The vertical line in the graph shows the average (mean) probability of a 5-star, top-100, 200, or 300 recruit being drafted. In order to avoid overcrowding and generally messy graphics, the figures only include states that account for at least 1% of the recruits in each group (5-star, top-100, 200, or 300).

For each state, I show the point estimate of the predicted probability of being drafted (the hollow square), as well as the 99%, 95%, and 90% confidence intervals around the point estimates. This allows you to see if the probability of a recruit from that state being drafted is statistically significantly lower than, higher than, or indistinguishable from the “average” five-star (or top-100, 200, or 300 recruit) recruit. Feel free to use whichever level of confidence interval you want, or none at all. I won’t judge.

So, for example, in the first graph below, we see that the probability of a 247 5-star recruit from Arkansas being drafted is .25, but also that this is not statistically distinguishable from the probability (.58) of the “average” 5-star recruit being drafted (all levels of the confidence intervals overlap the vertical line). On the other hand, a five-star recruit from Texas has a statistically significantly lower probability of being drafted than the “average” five-star at the 95% confidence level and below.

 

 

 

There are three major takeaways here:

  • 247 might want to reconsider giving top-200 rankings to recruits from Indiana. They consistently underperform the “average” 247 top-200 recruit. By a lot.
  • Texas and, to a lesser extent, California, pretty consistently underperform in the NFL draft compared to the “average” recruit in each category. This is pretty significant, given how many 5* and top-300 recruits these states produce.
  • The other Southern states with large numbers of recruits in the top-300 (Florida, Alabama, Georgia) perform just fine. This doesn’t appear to be a “Southern” bias (if it is indeed bias).

BETTER RESULTS (OR, RESULTS, RESULTS, RESULTS!)

The above graphs allow us to compare the probabilities of being drafted to some “average” recruit that I’ve inserted into the figure. Is this appropriate? Well, kind of? It allows us to establish a pattern and see some trends which is, in itself, useful. But it is kind of arbitrary and, most importantly, it doesn’t really allow us to compare the states directly to each other.

To directly compare recruits from one state to another, I show the marginal effect of a recruit’s home state on the probability of being drafted. The marginal effect simply shows the change in the predicted probability of being drafted as we move from a baseline (comparison) category (in this case, Texas) to a different state. Negative values mean that a recruit from that state has a lower probability of being drafted than a recruit from Texas, and positive values mean the opposite (i.e., a higher probability of being drafted than a recruit from Texas). Again, three levels of confidence intervals are provided (99%, 95%, and 90%). A marginal effect is statistically insignificant if the confidence intervals encompass the value of zero (no difference). That is, there is not difference in the probability of being drafted compared to Texas.

So, for example, the first graph shows that 247 5*s from Florida, Georgia, Alabama, Arizona, and Maryland all have a statistically significant higher probability of being drafted than a 5-star recruit from Texas (at the 95%-level). Among top-100 recruits (the second figure), the list of states where recruits have a higher probability of being drafted than one from Texas is extended to Florida, Virginia, Louisiana, Michigan, South Carolina, Maryland, and Washington at the 95% confidence level, and Alabama and North Carolina at the 90% confidence level.

 

 

 

These results are a bit more generous to Texas in that they show the record of Texas recruits being drafted is on par with that of a number of other states. But they also still show that top-300 Texas recruits underperform in the NFL draft relative to recruits from many other, even less hot-beddy (is that a word?) states. California, insofar as it is not statistically distinguishable from Texas, is in the same boat. Again, this points to a problem with the rankings given the sheer number of top-300 recruits these states produce.

The results also reconfirm that 247 may want to rethink how it evaluates recruits from Indiana. Just sayin.

FINAL THOUGHTS

So, um, yeah. We need to have a conversation with the recruiting services about Texas and, to a lesser extent, California. I suppose one could argue that the sheer number of top-300 recruits from these states will lead to a higher miss rate. But Florida has a similarly large number of top-300 recruits and doesn't really face this problem.

Then what could be the issue? I can think of two possible explanations. One pretty obvious intervening variable between high school and the draft is, well, college, which is something I haven’t taken into account. It’s quite possible that recruits from Texas are less likely to be drafted because they are all committing to Texas and Texas A&M and aren’t developed beyond the skills they came in with. On the other hand, the ones that are drafted are those that chose to go to Alabama, Georgia, or Michigan. For some reason this answer just doesn’t appeal to me. I mean, I like to poke fun at Texas as much as the next person but, as far as I know, UT and TAMU do OK in terms of getting their players drafted. 

A second explanation is that there is some inherent bias in the recruiting services, like the posters at the beginning of this diary were wondering about. I wouldn’t attribute this to any particular intention to make a particular state and/or recruit look better. Instead, I’d guess that, if shown two recruits with similar numbers against similar competition, the services would rank the one from Texas or California higher than the one from Michigan or Illinois because, well, California and Texas! They must be better, right? I don’t know enough about the recruiting services to say whether this is the case, though, so I’d be happy to hear from more knowledgable folks.

Now, with all that said, I’m damn glad for the Texas recruits we have. Blake Frazier is a big get and I’m keeping my fingers crossed for Taylor Tatum, Bennett Warren, Max Anderson, Michael Uini, or any other recruit the staff wants out of Texas. If they coaches have evaluated and want them, so do I.

Thanks for reading if you’ve made it this far.

Go Blue!

Comments

Blue@LSU

April 21st, 2023 at 10:09 PM ^

Dammit, the link to my joke in the opener didn't work. This is it

 

Edit: and now I also see that the data wasn't attached. I don't know if there's any way to get it posted.

Romeo50

April 22nd, 2023 at 9:01 AM ^

Came for the Monty Python skit stayed for the data. Excellent work. Weighted historical competition and talent development inferences seem pronounced. Undoubtedly more time spent outdoors active is a factor which many likely share. Likely these concentrations will become more noticeable. 

Logan88

April 22nd, 2023 at 7:01 AM ^

So...based on this data, I can sleep soundly knowing that two of OSU's highly rated recruits (5* WR and 4* OL) for the 2024 class are from Indiana, correct?

They are guaranteed to be busts, right? Right?!?

CFraser

April 22nd, 2023 at 9:43 AM ^

The bias mostly comes out with level of competition. This is huge when evaluating talent - particularly from film. Like that receiver from ID this year. Looks like he’s just running around pee wee players. So, because he’s playing inferior competition (euphemism for Idaho) he takes a hit (huge one) in his ranking. That’s the biggest part of the geographical bias I think. The camps are a little bit of an equalizer but the majority of their rating is going to be from their high school play and if it’s hard to tell if you’re that good or they’re that bad it makes for a more difficult scouting. Thing is, the reputation for those biased states was brought about mostly because of the rankings to begin with so it’s a viscous cycle. 

Logan88

April 22nd, 2023 at 4:44 PM ^

Agreed, that was indeed an eye opener. I believe that prospects who are rated as 5 stars are projected to be first round draft picks, so the fact that 40% of them don't get drafted at all is really surprising.

It would be interesting to see how accurate rankings are by position groups (QB, WR, RB, OL, DE, DL, LB, CB, S). I have never compiled nor analyzed any data but I have a general sense that OL rankings are the least reliable while skill positions like QB, WR and RB are some of the more reliable. 

Blue@LSU

April 22nd, 2023 at 5:01 PM ^

Here's what the numbers look like for 247 5*s. Top row are raw numbers, percentages beneath. 

Sorry it's a bit sloppy. Copying and pasting the table didn't work out too well. But it looks like you are probably right.

              |        drafted
 position |         0          1       |     Total
-----------+----------------------+----------
      APB |         3            5     |         8 
              |     37.50      62.50 |    100.00 
-----------+----------------------+----------
      ATH |         5           11     |        16 
              |     31.25      68.75  |    100.00 
-----------+----------------------+----------
        CB |        11          24     |        35 
              |     31.43      68.57  |    100.00 
-----------+----------------------+----------
        DT |        15           24    |        39 
              |     38.46      61.54  |    100.00 
-----------+----------------------+----------
   DUAL |         2           10     |        12 
              |     16.67      83.33  |    100.00 
-----------+----------------------+----------
        LB |        10            8     |        18 
              |     55.56      44.44  |    100.00 
-----------+----------------------+----------
       OC |         1              0    |         1 
              |    100.00       0.00  |    100.00 
-----------+----------------------+----------
       OG |         4             4     |         8 
              |     50.00      50.00  |    100.00 
-----------+----------------------+----------
      OLB |        11          12    |        23 
              |     47.83      52.17 |    100.00 
-----------+----------------------+----------
        OT |        13           22   |        35 
              |     37.14      62.86 |    100.00 
-----------+----------------------+----------
     PRO |        11           10    |        21 
              |     52.38      47.62  |    100.00 
-----------+----------------------+----------
        RB |        22          22    |        44 
              |     50.00      50.00 |    100.00 
-----------+----------------------+----------
          S |         6            16    |        22 
              |     27.27      72.73  |    100.00 
-----------+----------------------+----------
     SDE |        12          18     |        30 
              |     40.00      60.00  |    100.00 
-----------+----------------------+----------
        TE |         3             3     |         6 
              |     50.00      50.00 |    100.00 
-----------+----------------------+----------
    WDE |         6            18    |        24 
              |     25.00      75.00  |    100.00 
-----------+----------------------+----------
       WR |        24          20    |        44 
              |     54.55      45.45 |    100.00 
-----------+----------------------+----------
     Total |       159         227   |       386 
              |     41.19      58.81  |    100.00
 

CriticalFan

April 22nd, 2023 at 2:53 PM ^

My new suspicion is that the more recruits in an area, the more  reporters live nearby, then those kids get seen more and the reporters have more links to the schools that produce them. A bit of a reinforcing cycle but understandable due to the populations.

XM - Mt 1822

April 22nd, 2023 at 7:04 PM ^

the doctoral board has considered your thesis, blue, and after a thorough review and robust discussion has awarded you the 'doctor of statistical dudeness' degree.  congratulations, dr. blue!   

looks like my comment about texas was a good starting point.  can you cite me in a footnote to your thesis somewhere, make me feel important or smart or something.  you know, kind of like when the gave a brain to scarecrow

if i only had a brain | Wizard of oz, Wizard of oz characters, The ...

Blue@LSU

April 22nd, 2023 at 8:03 PM ^

Awesome. Now I'm off to update my cv and business cards. But do I have to start wearing tweed and learn to smoke a pipe?

You inspired the whole thing, my man. For that, you get not only a footnote, but an in-text citation, a dedication, and a prominent place in the acknowledgements. And when I make it big time, I'm going to fund an endowed chair in your honor, the XM-Mt Family Chair of Statistical Dudeness (to be filled by me of course). 😊

Blue@LSU

April 23rd, 2023 at 11:03 AM ^

IMG is a problem, which you can kind of see on the county map. And CFB data (where I get the hometown location codes) mainly treats as if they are from Bradenton.

There are some exceptions, and JJ is one of them. His hometown is coded as La Grange Park, IL on cfbdata, so he is treated as an Illinois player (but his HS is listed as IMG). My guess is that it just depends on where they were located when they were first ranked by 247. Since JJ started and was ranked when he was at Nazareth, his code carries over despite transferring schools. But that's just my guess.

It would probably be worth it to go back and code an alternate hometown for IMG players if I can find it. 

Blue Vet

April 23rd, 2023 at 8:56 AM ^

Wow. This is impressive work. 

And now I know to be dubious about Texas and California rankings.

As for the states themselves, I'm fine with CA, having lived there a few years. And Texas, I only spent a couple months there, so the data set's incomplete.

 

victors2000

April 25th, 2023 at 8:41 PM ^

I read it! Perused it, or maybe scanned it; can I still get a CE for it?

Nice write-up! Unfortunately, I took a sleep-aid about 20 minutes ago, and didn't get much out of it. 

Good night...