“For every reaction, there must be an equal or opposite reaction”
This is something that especially applies to doing something such as opponent adjustments for any power rating system or model.
Unfortunately the way I was calculating home advantage was not being done properly. I calculate home advantage by taking the 800 most evenly matched matchups of the year not played on a neutral floor and calculating the home/away splits. The error was that I was not dividing the home/away splits by the sum of just these 800 games, but by the sum of every game played on the year. This was causing many away splits to also be above 1, and therefore when adjusting stats and factoring in home/away advantage, there was not an equal or opposite reaction being performed.
Any home/away advantage split frame must have a mean of 1. For example, if the average home offensive efficiency is 1.02, the away offensive efficiency must be 0.98. Otherwise its going to throw off adjustments, which it was doing in my system.
The impacts were far reaching. For example, the average turnover percentage was 10 percent higher than the average defensive turnover percentage. The free throw attempt per possession percentage(perhaps the most important metric for totals) had an average defensive FTA/Possession % that was nearly 13% higher than the offensive counterpart. Obviously this is going throw many things off, especially monte carlo simulations which rely on these percentages for probabilities, but also the ATM/Blender which relies on a historical database.
When adjusting stats, the final adjusted output does not need to completely match up. For example, after making this fix, the average turnover % is 19.2%, and the average defensive turnover % is 19.12. They won’t completely sync up, that’s just the nature of doing multiple rounds of adjustments, but they need to be pretty damn close, otherwise you have a problem.
I discovered this error today when I noticed that the k values of my ATM model(a cluster analysis based model) were all extremely low and making some weird predictions. After doing some digging, I made the discovery that the opposite metrics were way off, and that the home advantage calculation was to blame.
So, what does this mean going forward? Well, I made the fixes and at least the power ratings portion of my website now reflects the fix. Unfortunately the picks portion of the website is still going to take some work. All machine learning methods and databases I have built up over the last 2 weeks are now useless due to the changes. I will have to build up those databases again, which I can rehab for pre game betting by going back a week or two one day at a time, running the model up to a certain date, and running projections. What disappoints me is the fact I won’t be able to do this for my live log/2H betting database which I had built up.
The pick rehabilitation is going to be an all day affair so I am going to go to the gym to get that out of the way so I can focus on this for the rest of today. I will give an update when the picks portion of the website is rehabilitated. Previous days picks won’t be updated, but this is more a means to build back up a database I can use for machine learning. I already locked in today’s picks unfortunately, so I am just going to let those ride. At least the benefit of machine learning is that it can be trained on bad data to produce something, but I think making my overall model more structurally sound is a better long term decision for the remainder of this season, which should hopefully also improve the machine learning outputs.