clock menu more-arrow no yes mobile

Filed under:

Casting a Wide NET: Finding the basketball rankings

NCAA Basketball: N.C. State at Duke Rob Kinnan-USA TODAY Sports

With a little extra time off from football season (seasonal affective disorder), we can look forward to basketball! There’s a lot of uncertainty about the new players and how they’ll replace the talent that we lost. But one thing is certain: we’re going to have a ranking between 1 and 362 from the NCAA Evaluation Tool (“NET”). And after a few years of looking at it, I have a very good idea of how that ranking is made.

The NET has been pretty kind to NC State since its debut in the 2019 season. You may remember that as the year that we beat Auburn before their Final Four run, we took Virginia to overtime before their National Championship run, and we definitely scored more than 23 points in every single game. The NCAA told us that the NET had five components: a “Team Value Index” that uses win-loss results, an efficiency calculation for the advantage in points per possession, the winning percentage in two different ways, and the scoring margin capped at 10 points per game. That led to some interesting tactics where teams would desperately try to get to a 10-point win. And after several months of uncertainty about how the NET would be used, the committee decided that the #33 team in its fancy new ranking should miss its 68-team tournament. There were some valid concerns that the NET gave too much credit to beating up on bad teams but I appreciated how the rankings captured our narrow losses to good teams too.

Starting with the 2021 season, the NET formula dropped the winning percentages and the scoring margin. It adjusted the Net Efficiency for opponent strength and it kept the Team Value Index. So the NET’s inputs are pretty simple. For each game, you just need to know the points per possession, the location, and the winner & loser. But how are those things used? The formula is a secret! Well, you can start by Googling it. Seriously, the NET was made with Google Cloud and they told us exactly how they like to calculate efficiency and adjust for strength of schedule.

The EFFICIENCY is the difference in the score per 100 possessions. Possessions are estimated by the number of field goal attempts minus the number of offensive rebounds plus the number of turnovers plus 0.475 times the number of free throw attempts. Let’s pick a game at random from last year, say NC State over North Carolina 77-69. That’s an 8-point margin of victory. If you imagine that each team got 100 possessions, the margin would be 12.78 points. That’s 100 times (77 points divided by (71 field goal attempts minus 10 offensive rebounds plus 2 turnovers plus 0.475 times 10 free throw attempts) minus 69 points divided by (62 field goal attempts minus 18 offensive rebounds plus 13 turnovers plus 0.475 times 24 free throw attempts)).

Now that we have that 12.78 calculated, make variables like “team_ncstate” and “opponent_northcarolina” and give both a value of 1. Make variables for the hundreds of other teams and opponents and give those variables a value of 0. Give a value of 1 for hca (home court advantage) because NC State was at home. That’s one row of information. Make another row from the other team’s perspective. (The efficiency is negative 12.78, team_northcarolina is 1, opponent_ncstate is 1, and the hca is negative 1.) Do that for all games and perform ridge regression to determine every team’s efficiency number. You may remember regression from a math class. For example, if you have information from a thousand house sales, you can estimate the difference in price for having a garage. Here we’re estimating the difference in points per 100 possessions for every team. We’re also estimating the home court advantage, which is considered the same for every team. Note that this calculation only sees a small difference between winning by 1 or losing by 1, but it sees a big difference between winning by 10 or 50.

The remaining component is the “Team Value Index” which only considers wins, opponents, and location. I tried a lot of things to find this. Ridge regression, logistic regression, markov chains, Massey’s Method, Colley Matrix, every network centrality I could think of, and every computer ranking I could find. I eventually found one thing that worked well.

The VALUE can be estimated with a Bradley-Terry model. Bradley-Terry has been used in rankings for over 70 years. I believe it was in one of the BCS football rankings. It adjusts well for strength of schedule and it gives logical results even when most teams don’t play each other. You can imagine a grid with 362 rows and 362 columns, with a value for how many times every team beat every other team. The grid is mostly full of 0’s because most teams don’t play each other, but there are a few 1’s and 2’s and 3’s in there. Actually, there’s an adjustment for game location. Teams get a credit of 1 for a neutral win, 0.6 for a home win, and 1.4 for an away win. Each team is also assigned a win and a loss against a fictional team, so that undefeated teams don’t have infinite value and winless teams don’t have zero value. Then there’s some math to assign every team a strength rating which maximizes the likelihood of getting the wins and losses that actually happened.

Here’s a plot of top-50 NET teams on Selection Sunday, with the highest efficiency on the top and the highest value on the right. You can explain almost every difference in ranking by one team having a better efficiency or value than the other.

I’m not so sure how the two components are combined to create the final ranking. However, it’s clear that the Efficiency is more important. I would count the Efficiency ranking at 80% and the Value ranking at 20%. For example, a team that is 10th in Efficiency and 25th in Value would have a NET ranking of about 13, because 10*80% + 25*20% = 13.

What should a team do to get a good NET ranking? Win. Win by a lot. Don’t lose by a lot. It really helps for your conference to do well too. If your conference opponents win their non-conference games by 30, you look good even when you lose conference games. On the other hand, if Florida State loses to Stetson by 9, every ACC team that doesn’t beat Florida State by the same amount looks worse than Stetson. Should teams run up the score? Yeah, probably. Well, let’s just say that it’s always a good time for a good possession. For the NET, it’s the same thing to win by 24 points in 60 possessions as it is to win by 32 points in 80 possessions. So you don’t have to go at a fast pace. But if you beat Green Bay by 10 and others are beating Green Bay by 40, you’re probably going to look bad to the NET.

Unlike how the NCAA just talks about transparency, I give you actual transparency. You can see my code and data at my Github page: