Testing Hypotheses on Recruiting, Part I: The Data and Some Graphs

Submitted by Blue@LSU on March 30th, 2023 at 9:44 PM

“Winning will solve our recruiting problems.”

“Recruits should be lining up to play at [Position]U.”

“Why would any WR/QB recruit want to play in this offense?”

We’ve all seen, and maybe even made, these and many other claims about recruiting on this site and elsewhere. We may even consider them conventional wisdom without questioning their validity. But are they, in fact, supported by the data? What factors weigh most heavily on a recruit’s decision to commit to a particular school? Is it possible to identify the correlates of recruiting and, if so, what are they and how strong of an effect do they have?

This is the first of what I plan to be a couple of diaries (spaced out over several months when I have time) that will subject these and other recruiting claims to empirical scrutiny. Today’s diary will go through some of the initial hypotheses that I plan to test, discuss the data, and then show some pretty graphs. I’m still collecting and merging some of the data for the independent variables, so the actual hypothesis testing will come in later posts.

 

INITIAL HYPOTHESES/RESEARCH QUESTIONS TO BE TESTED

Below are just some of the claims that I thought would be fun and feasible to explore. If you have any suggestions for additional tests, please let me know and I’ll look into the data availability, etc.

  • Just win, baby: recruits are more likely to commit to programs with a recent record of success on the field.
  • Get me to the Pros: recruits are more likely to commit to schools with a higher number of recent draft picks, first round draft picks, etc.
  • [Position]U: recruits are more likely to commit to a school with a record of success at their particular position (draft picks, Heisman contenders, position-level stats, etc.).
  • Distance matters: recruits are more likely to commit to geographically proximate schools, in-state schools, etc.
  • Distance matters, unless you are Alabama and Georgia (alternative title: “fun with interactions”)
  • Just how hard is it to pull a recruit out of SEC territory?
  • Show me the money: I’m working on a way to see the effect of NIL. It’s gonna take some creativity…
  • Any other suggestions?

Of course, these statements are not mutually exclusive. It’s possible that all, some, or none of them are supported by the data. And, I’ll admit, some of these relationships might seem so intuitive that they have become generally accepted or conventional wisdom. You may even wonder what is the use of testing, say, whether winning matters for recruiting. To that I would reply:

  1. ‘Conventional wisdom’ is often wrong, and even the most basic claims should be subject to empirical scrutiny (remember, many thought general war was unthinkable on the eve of WWI).
  2. Even if we ‘know’ that, say, winning has an effect on a recruit’s decision, it's still relevant to know how much of an effect it has.
  3. It’s the offseason. And I’m curious. What else am I going to do with my spare time?

 

THE DATA

The hypotheses are all specified at the recruit level, suggesting some factors that will influence a recruit’s decision to commit to one school over others. Since it’s typically a good idea to avoid drawing inferences about individual behavior from aggregate data, the data need to be collected at the individual level.

The base data come from the 247 composite list of top-200 players in each recruiting class for the past 5 years (2019-2023). The data are organized by the recruit and the offer school. That is, for each top-200 recruit, there is a separate observation (row) for each P5 school that made a scholarship offer as listed on each recruit’s profile page on 247sports.com. For example:

Each recruit’s unique ID number and the ID for each offer school from collegefootballdata.com are included for easy merging of other datasets.

Altogether, the dataset includes 970 of the 1000 recruits from 2019 to 2023. Nine recruits are removed because they committed to non-P5 schools (I only include P5 offers and commitments). Missing data, either from 247 or collegefootballdata.com account for the remaining missing observations. One notable missing observation is Quinn Ewers, ruining my ability to test the relationship between mullets and commitments to OSU.

The total number of observations (n), which includes every P5 offer for every recruit, is 20,926.

I’ll attempt to attach the base dataset for anyone that is interested. I’ve never attached a file here, so hopefully it works. If it does work and you notice any errors, please let me know.

 

ANTICIPATED FAQ

Why only the top-200 recruits?

Primarily because I have a real job and this is just a hobby.

But another reason is to help differentiate offers from “offers”. The lower the ranking, the greater the chance that an offer school, especially the top schools, ‘cooled’ on a recruit rather than the reverse. I realize this is not always the case and that schools will often heavily recruit lower-ranked players, especially based on their own evaluations. But in the absence of being able to distinguish offers from “offers”, I use the top 200 as a proxy for ‘committable’ offers.

Why only P5 offers and commitments?

Because the vast vast vast majority of top-200 recruits choose P5 schools. Only 9 of the almost 1000 recruits in the data from 2019-2023 chose to go to a non-P5 school (4 to Jackson State, 1 each to Cincinnati, Houston, SMU, and UCF).

There is also the issue of making valid comparisons in later statistical analyses. Does a perfect season and (cough, cough) “National Championship” at, say UCF, mean the same to a recruit as a perfect season and National Championship at LSU or Georgia? Is Western Kentucky a more attractive destination than Ohio State for a QB recruit because their QB threw for the most yards in the NCAA and 1,000 more yards than CJ Stroud? Ultimately I want to be able to make ‘all else equal’ comparisons but, unfortunately, ‘all else’ is not equal in comparisons between the P5 and non-P5.

Why only 5 years (2019-2023)?

Seriously. I have a damn job.

 

WHAT DO THE DATA LOOK LIKE: DESCRIPTIVE STATISTICS

I’ll mainly present these without commentary since they are pretty self-explanatory and this post has already gotten a bit long. But I hope you enjoy them and that you find them informative. I first give some descriptives of where the top-200 recruits come from, and then where they end up committing.

All of the graphs below are also available here.

Where do the top-200 recruits from 2019 to 2023 come from?

The top-200 recruits by home state:

The top-100 recruits by home state:

Maps are also a nice way to visualize these data. Pay particular attention to the category values in the legend. As you can see from above, the data are pretty heavily skewed which makes the binning for the maps a bit difficult. If anyone has any suggestions for the future (I'm pretty new to creating maps), I'd be happy to hear them.

The top-200:

The top-100:

For people that like a little more granularity, I broke the top-200 down by the recruit's county (there would be much more bare space if I also showed the top-100):

There’s a pretty evident geographic orientation of the top 200, with the main hot spots in the south (especially Florida, Georgia, Louisiana, Alabama, and Texas) and SoCal. Among the northern states, Detroit stands out as one of the biggest recruiting hubs.

Where do the top-200 recruits commit?

First by conference.

The top-200 recruits:

The top-100 recruits:

It really is insane to see the number of top-200 recruits that the SEC is able to sign. Over all years, the SEC pulls in 43% of the top-200 and 49% of the top-100 recruits in the nation.

The next few graphs further break the top-200 and top-100 commitments down by the offense/defense position of the recruit.

The top-200:

The top-100:

The SEC definitely emphasizes defensive recruiting. In 2016, they pulled in 65% of the top-100 defensive prospects in the nation. Alternatively, the ‘defense optional’ Big12 consistently pulls in the fewest top-200 and top-100 defensive recruits, with the Pac12 close behind. With the exception of 2023, the Big12 generally gets about half as many of the top defensive recruits as the Big10.

The next graphs break down the destination of the top-200 and top-100 recruits by commitment school.

As if you needed more evidence that the SEC is a recruiting juggernaut, especially at the top. Just note that Alabama and Georgia alone have pulled in the same number of top-200 recruits (169) as the entire Big10 (169) and more than the entire Pac12 (111) or Big12 (103). If they had three more top-200 recruits, they would have matched the total haul for the ACC as well (172).

A big factor in the SEC’s recruiting dominance is undoubtedly the geographic concentration of top-200 recruits in the south. But to get a fuller picture, future posts will delve into the effect of a number of factors (location, distance, wins, draft picks, position success) to understand why the top-200 recruits choose to commit to particular schools.

Any comments/suggestions/requests for future analyses are appreciated.

Thanks for reading. Go Blue!

Im hungry lets get a taco GIFs - Obtenez le meilleur gif sur GIFER

Comments

Blue Texan

March 30th, 2023 at 10:43 PM ^

Give me all the data!!!  Thanks for posting.
 

My current assumption is that future posts will show that results win recruits - wins, too 10 wins wins, players drafted by the NFL, players drafted in the first round, NFL longevity, and position development. 

I can’t wait to see what the data reveals. 

Blue@LSU

March 31st, 2023 at 10:11 AM ^

Thanks! And great to see a fellow traveler/data junkie on the board!

I'm guessing that most of these factors will be important too, but I'm really interested to see their substantive effects and which ones have a stronger impact on recruit choice.

I'm already anticipating some multicollinearity problems because many of these things--wins, draft picks, position development--are surely related. But that's a bridge I'll cross when I get there.

Blue@LSU

April 15th, 2023 at 7:04 PM ^

Yep, I plan to run some probits (commit/not commit) and then probably just show some graphs of the effects of each predictor.

I'm right now working on checking if there is any statistically evident regional/state bias in the recruiting services like some people were wondering. I'm hoping to have something in the next couple of weeks.

Thanks for reading!

Blue Vet

March 31st, 2023 at 7:39 AM ^

Wow. This is great.

I'm glad for your sake that you've got a job. Otherwise, I can see that these fascinating ideas might swallow all your time.

Blue@LSU

March 31st, 2023 at 10:15 AM ^

Ha! I try my best to compartmentalize and I've been able to limit myself to just doing this stuff for a few hours a week. But I have fun doing it and, as you can tell, I'm a bit of a data exhibitionist 😊 

Luckily, I work with data analysis/hypothesis testing daily in my job so this is really just shifting the topic to college football.

Blue@LSU

March 31st, 2023 at 11:22 AM ^

That's a good question and I think it really varies by field and by the researcher. In my area, .05 is usually the standard for 'statistically significant' results. Of course, this is arbitrary. To be honest with you, I'm more interested in seeing and reporting measures of uncertainty, of which the p-value is only one part. 

HAILtotheVICTOR33

March 31st, 2023 at 7:56 AM ^

You had me at

Testing Hypotheses on Recruiting, Part I: The Data and Some Graphs

Your words ooze my love languages of Hypothesis Testing, Recruiting, and Data Analysis, I await your next installment upon bated breath you beautiful Blue@LSU you.

In all seriousness,  I really appreciate you putting this together even though have a job. Among the various topics to explore, I wonder if this exercise might shed some light on a dichotomy between inherent biases within the recruit ranking industry and the development programs such as Iowa who ranks 8th in number of active NFL players, but often ranks in the 30s or 40s in recruiting rankings. That is to say, I wonder where the medium lies between recruiting rankings under ratings kids due to geography and some programs being especially good at developing kids for the next level.

Blue@LSU

March 31st, 2023 at 10:22 AM ^

Your words ooze my love languages of Hypothesis Testing, Recruiting, and Data Analysis,

Here's an Ode to Hypothesis Testing for you:

Roses are red
Linear regression is BLUE
Best Linear Unbiased Estimator
If the assumptions are true

That's an interesting question about the ranking industry and hopefully the data will have something to say about it. I did come across a post about a similar topic that was pretty well done. Here's the link in case you're interested.

Rockstars or Flop-stars? Examining Recruit Ratings and NFL Draft Success 

M-GO-Beek

March 31st, 2023 at 9:22 AM ^

I was pretty shocked to see that the ACC has pulled in relatively equal numbers, if not more of the top 200 kids the last 5 years as the B1G. I would have thought the B1G would have been much farther ahead with the general level of play from the conferences as a whole. I realize both conferences are weighted heavily by OSU and Clemson respectively, but not all of those players went to OSU and Clemson.  You would have thought the ACC would have been a stronger conference over that time period given this.  My guess is it shows the dramatic under-performance of FSU and Miami (the two schools most likely to pull in high level recruits) in particular that the conference has been as weak as it has been.

Blue@LSU

March 31st, 2023 at 10:34 AM ^

Yeah, the southern ACC schools definitely have the advantage of being in or near recruiting hotbeds. It is a top-heavy conference (just like the B1G), but when I was collecting the data North Carolina was a school that stood out for some reason. I'm not sure why, but I was surprised at the number of recruits that they were able to keep in-state, especially when there are some strong programs in close vicinity. 

Miami and FSU's underperformance, given their inherent recruiting advantages, is just shocking. Just goes to show that coaching and development matters.

Blue@LSU

March 31st, 2023 at 7:59 PM ^

Update to the data: Add one more top-100 recruit to USC as Duce Robinson (TE) just committed. Couldn't he have committed before I posted this?  

XM - Mt 1822

March 31st, 2023 at 8:31 PM ^

blue, 5* effort and graphs.

two comments

1.  despite the ratings, i'd be curious about what percentage from a given state actually panned out to their rating, admittedly a tough and subjective data set.  and by that i mean take for instance, say, texas with it's 3 million 5*'s.  are they panning out at the same percentage as other states 5*'s or is there some over-rating bias in their rankings?

2.  at least in our house for our two oldest sons who are/were college football players, geography was a significant consideration to the point that some offers weren't really considered even though they were attractive, just because of the distance from home - east coast/west coast, as an example.  

Blue@LSU

March 31st, 2023 at 10:36 PM ^

Thanks, XM! Does that 5* effort get me any NIL deals? 

That would be an interesting analysis that I think is very feasible if we think about 'panning out' in terms getting drafted. I'll already be working with the draft data, so it would only take a few commands to match each player up with their draft position (if they are drafted). It would be pretty damn valuable to know whether recruits from particular states are over-ranked. The only issue would be finding out a way to account for the intervening variable: what team they played for in college. In other words, do Texas recruits not pan out at a larger rate because they are from Texas? Or is it because many of them committed to TAMU/Texas and got poor coaching? I'm gonna have to look into this. Thanks for bringing it up. 

I think the analysis of geography is going to provide results that are consistent with the experience of your sons (I've already done some simple tests and this is basically what I found). But I'll be interested to see if this can be outweighed by other factors, like if a more distant school has a better record of developing players at your position, etc. I think it would also be interesting to see if recruits from the transfer portal act in the same way. Or if instead the pull to stay closer to home is weakened by simply having already lived away from home, even if only a short distance.

First And Shut…

April 1st, 2023 at 10:08 AM ^

It would be interesting to see a graphic for NFL picks, by home state of the player, to see if there is a geographical bias in 247’s ranking of recruits - in other words, are Texas, Florida, Georgia, etc. as dominant in producing NFL picks as they are in producing top 100/200 recruits? 

Lots of things happen in the years between college recruiting and NFL drafting - player selection and development, injuries, etc.

WestQuad

April 3rd, 2023 at 10:08 AM ^

I'd be interested in that.   Do the Southern states really have that big of a talent advantage or is there a bias in the rankings?  Seems like the number of guys going pro would be fairly evenly distributed across geography unless there really is an advantage to playing in states like Texas where they play football year round (or at least train for it.)    Not sure how you'd account for magnet schools like IMG that take the best talent from everywhere.

ThisGuyFawkes

April 2nd, 2023 at 11:08 AM ^

https://www.maxpreps.com/m/news/kSVmutjJi02CQsrCw9V3AQ/2022-nfl-draft-state-by-state-look-at-where-draftees-played-high-school-football.htm
 

See link for 2022 draft data, I’d imagine a quick google search would find you previous years. 
My initial takeaway is that Georgia is way over represented with 30 draft picks - a close #2 to Texas - when compare with the expectations based on OP data. Florida is similarly underrepresented, and my guess is that there is a heavy influence on this data based on where the kids went to school - but would take some further digging to prove that out. 

mikewein

April 3rd, 2023 at 1:08 PM ^

Thanks, I can't wait to see more of this.  One question I would have is how much of an effect would number of open roster spots have. So if you recruit highly ranked players, but more leave early, then you have more spots open to fill with highly ranked players.  So you might have the same number of 4/5 stars on the team at any time, but have to replace them twice as fast as someone else.  Or since you lose about 1/4 of the team each year, is the variance in that not significant

Blue@LSU

April 3rd, 2023 at 5:49 PM ^

That's a good point. Open roster spots will definitely influence the number of recruits you can take, but I think teams will always find room for 5*/high-4* recruits. Teams with higher turnover, either to the NFL or the portal, also need more recruits to restock that talent. Unfortunately, it's almost impossible to know how many roster spots teams have every year. 

RAH

April 3rd, 2023 at 7:28 PM ^

Although not highlighted parts of your information I found the maps also interesting. Particularly the States formerly known for producing high numbers of quality recruits (Pennsylvania and Illinois) that are now also rans.

Blue@LSU

April 3rd, 2023 at 10:51 PM ^

I don't know if there are any reliable data going back far enough (when did the recruit-ranking industry really even begin?), but it would be interesting to see geographic shifts in recruiting hotspots. It would be really cool to map that along with population shifts out of the rust belt and into the south. I'm sure they'd match up pretty closely with a bit of a time lag.

King Tot

April 12th, 2023 at 10:46 AM ^

Star ratings are the best indicator of talent we, as fans, have for players who are being recruited which is the point of this diary. 

Also, NFL contracts are not a great indicator of how talented a recruit it. It is a great indicator for how ready NFL teams believe individuals are for the NFL after years of additional training and development in the college level and they also are unreliable. How many top 10 picks bust? How many late round picks succeed? There are countless variables that must be considered at every level of football and luck often has as much a factor as skilled evaluations.

PeteM

April 16th, 2023 at 10:20 PM ^

Fascinating. Based on a quick glance, I'm surprised that Miami, Texas and USC still recruit so well despite lack of recent success. While I agree that recruiting and on-field success are strongly correlated it's always interesting to see the outliers.

By the way, I see LSU in your screen name. Do you live in Louisiana? I have family in Baton Rouge.

Blue@LSU

April 17th, 2023 at 8:27 PM ^

Thanks.

That's one of the benefits of being in fertile recruiting ground I guess. I'll be interested to see how much the in-state factor is diminished (if at all) by the "quality" of the in-state programs. 

Yep. I live in BR. Thinking about maybe going to check out the spring game this weekend.