Wednesday, September 18, 2019

UHSAA Rankings Review part 2

It took me longer than I would have liked, but I finished my review and got my model to (mostly) match the official rankings.  Here is the link to part one if you need a refresher.  Below are some of the changes made, and how things will be slightly different going forward.  

If you don't want to read the whole process or the details the TL;DR is that I got it to mostly match (link to UHSAA and the link to my model output so you can compare the two), but there are some inconsistencies and errors in the OOWP section that gave me a headache to try and figure out how to manually override, and I won't be doing that again going forward.

UHSAA/MaxPreps Logic Mistakes My misunderstanding of the classification down scenario (WP):
After reading the section that I posted another 10 times, I came to the conclusion that the sentence "This modifier only comes into play when a team wins" is strictly referring to losing a portion of the win, and not getting the first lower classification opponent to count as a regular opponent.  I thought the win portion meant that the higher classified team would be guaranteed that their first win would count the same, but that is clearly not the case in the dataset.  So I changed the logic in my model to accommodate for that.  After taking care of that error, I got the WP to match between my model and the official data.

OWP:
This feels odd to say, but I don't remember having to do much to fix OWP.  Once I solved the WP problem from the previous section and made sure to round the numbers to 6 digits, everything matched pretty quickly and easily.

OOWP:
This was the section that took most of my time and made me want to give up. I consulted this section on the FAQ several times to make sure I wasn't misunderstanding anything.



I was worried about the head-to-head matchup part in OOWP, and wondered if it would mean removing region data (or data from common opponents).  For example, I was thinking that it meant when calculating Dixie's OOWP from Desert Hills, that I would have to strip out Dixie's games from Desert Hills other opponents (all of region 9), and that I would have to get Pine Views record excluding Desert Hills and Dixie games.  That would have been rather difficult and would have required comparing two lists and pulling out information.

I tried testing that out by hand and was getting nowhere.  So I went to my best friend who got me through my programming classes and asked for help.  His name is Google.com and he knows a lot of stuff.  I eventually found this site https://www.kaggle.com/c/march-machine-learning-mania-2014/discussion/6769 and about 2/5 of the way down it mentions OOWP.



I decided to try this approach used in traditional RPI calculations instead of worrying about removing head-to-head matchups like the UHSAA site says.  Once I fixed that formula, it solved about 90% of the errors I was seeing in OOWP.  There were only a handful of problem teams left at this point, and I could tell they were all teams who had out of state opponents - though not all teams with out of state opponents had errors, which matters later.  Unfortunately, out of state teams are a very manual process.  I recalculated the different winning percentages to remove the head-to-head issues that I fixed before, but I found that it made the errors bigger, and it created errors with all the teams who had out of state opponents instead of half of them like I had originally seen.

I reverted back to the original calculation and had to tackle each team/error individually.  Fortunately, there were enough teams who had out of state opponents and everything was calculated correctly that served as a template and let me know I was on the right track (I never thought I would say something nice about Snow Canyon, but thank you for being one of the correct schools with an out of state opponent -- you saved my sanity).  Some of the issues were my own problems (calculating OOWP for Hawaii and California schools) where I accidentally made things more complicated and it was a quick fix.

In the end, I had to fudge the numbers for 6 out of state schools to make the model match. One or two of the situations made sense (they hadn't played 2 other teams besides the Utah team, so their data can be tricky to feed-forward), but there were a couple of more schools that for some reason opponents were not being counted.  For example, American Fork was giving me an error and had one out of state opponent, Arbor View, that I needed to look at.  Arbor View has played 3 non-Utah teams: Basic, Hamilton, and Legacy  I got their winning percentages (excluding when they played Arbor View the percentages are 50%, 100%, and 33% respectively), and then averaged the data.  Having done this enough times, I looked at the size of the error and realized the issue was with Legacy.  Once I deleted Legacy's value from the equation, the numbers matched up between my model and the UHSAA information.  Legacy has played 2 other teams (besides Arbor View), which should be enough information to be included, but for some reason, it wasn't.

Future Plans for the Model:
I was never able to figure out Canyon View's problem (I tried the same steps in the previous paragraph), so they are the only school that doesn't match between my model and the UHSAA data.  Going forward, I will not fudge the numbers in my model, and I will include all of the data that is on MaxPreps once I start updating my model this weekend.  Perhaps the UHSAA formula requires 3 opponents before the data is included, or maybe they require a certified score from a coach instead of a fan before the games are included, or maybe the data wasn't in MaxPreps when they ran their calculation.  Whatever the reason is, something isn't totally right, and I'm not sure why.  Maybe it's an issue of me misunderstanding the FAQ (WP), or of UHSAA not following their explanation (OOWP), but we will just have to wait and see what happens.  I will continue to publish my model because of the discrepancies between the two.

No comments:

Post a Comment