Monday, September 16, 2019

UHSAA Rankings Review

Today is the day that UHSAA published their RPI Rankings.  As I was going through, most of it didn't match my model, but it was all very close.  Once I saw they were close, I knew my model was on the right path, and I was confident I could find the problems.  So here are the problems that I found, what I fixed, and what UHSAA/MaxPreps need to fix.

Typos/Personal Mistakes:
This list was longer than I wanted.  I found roughly 5 schools I had screwed up on their data.  Usually, I gave teams losses instead of wins, so most of the teams actually had a better score than my model said.  The two exceptions were Layton Christian Academy and Juan Diego.  I gave LCA an extra win on accident, so their score dropped significantly.  With Juan Diego, they were supposed to play Emery, but Emery forfeited that game.  I wasn't sure how that would be counted in the formula, or if it would be counted at all.  It turns out that a forfeit is counted as a regular win or loss -- it's not treated like an exhibition or a non-varsity opponent.  Fortunately, the mistakes were made a few weeks ago when I was trying to automate some of the data entry, and some of them were made as I will still refining my process for entering data.  This shouldn't be a problem going forward as I now have a routine/system for entering the data.

My Logic Mistakes:
I misunderstood the table found under the "cross-classification" question on the FAQ page.  I thought the bottom two lines of the table were related to figuring out the number of classifications in the opponents' state and getting the average.  In reality, that average applies to every out of state opponent, and the difference relates to the sport.  Football is a 5 classification sport, and every out of state opponent gets the same score from the "5 sport" line.  I think it's a bit of a flaw that Bishop Gorman is worth the same as Virgin Valley, but UHSAA isn't in the business of ranking Nevada teams.

UHSAA/MaxPreps Logic Mistakes:
The only bug I have been able to find in the official data is when teams play down.  From the UHSAA site: There is a one-time exception for a team playing down. That means, when a 3A team plays a 2A or 1A opponent for the first time on their schedule, that 2A or 1A opponent will count as a 3A team. Subsequent games against teams from lower classifications will count as their true classification. This modifier only comes into play when a team wins. Under the modified RPI system, each game is assigned a value based on that team's classification. Again, there is a 15 percent difference between them. So, for example, a 5A team will always have a game value of 1.749, regardless of who they're playing. The value of the win changes according to their opponent (unless the exemption comes into play). The result gives us a modified winning percentage. This is the number that will be used throughout the formula, including for their opponents, and the opponents of their opponents. So a team that goes undefeated but plays multiple teams below their classification may end up with a winning percentage of less than 100%.
What is currently happening is that if a team plays down twice, and loses the first one, the second opponent is counted as their true classification.  This has dropped a handful of teams .03 points on their winning percentage.  It's not a lot, but it matters, as my next point goes in-depth about rounding.

Rounding:
I noticed that the WP data wasn't matching after 6 decimal places, but everything was really close.  The OWP data was off after 2 decimals, but it was close enough that I knew I could find the error.  It turns out, the extra decimals with the WP data was feeding forward into OWP and compounding into bigger errors.  Once I changed my excel model to round to only 6 decimal places, it fixed most of the OWP errors.  The only remaining OWP differences relate to the previous point of the playing down modifier.  Once that gets fixed, it should solve the errors with WP, and OWP.

The data is still far enough off on OOWP that I won't' be able to test that until MaxPreps fixes their data, but I suspect it will be resolved with correct data and rounding.  I also know I have one other forfeit error for an out of state opponent that I need to go back and fix.  That will require me to manually go through and check everything until I find where that happened.

The last issue I have with how the rounding is done on the official data is that it will favor one team or hurt another.  I can't know for certain until the data is fixed, but I believe Millard and Beaver would swap places if they didn't do the rounding until the very end.  I'm sure issues like that will resolve themselves once teams keep playing and have more data points, but it will be interesting to see if there are close scores around bye-weeks/home-field advantage.

Final Thoughts:
Overall, everything looks good, and once I get the data matched up, I will continue publishing the data once UHSAA stops (October 11th I believe).  I have tweeted at UHSAA to fix their bug, and hopefully, they will soon.  I will try and find an email in the meantime to make sure it does get fixed.

No comments:

Post a Comment