Daily Baseball Accuracy Testing - Razzball Ombotsman

It has been almost 2 years since we launched our first daily fantasy baseball tool (Streamonator in 2012). Since then, we have launched several other tools such as Rest of Season Player Rater + Hittertron in 2013 and DFSBot in 2014.

Razzball Nation has been a huge part of these tools from the start – both in encouraging us to create them and providing ongoing feedback to make them better (e.g., we now report ‘next week’ data on Fridays to assist those in weekly roster leagues, added game time, etc).

But one valid ‘ask’ that we have not been able to deliver until now is: “How accurate are Razzball daily projections?”

We are no strangers to accuracy testing. We have run accuracy tests of Fantasy Baseball Preseason Rankings for three straight years and have run accuracy tests on baseball projections as well. But it has still been a challenge to come up with a format that 1) could be updated on a daily basis and 2) could make some sense without an advanced statistical degree.

Our first working version is now available. Meet the Ombotsman. We have appointed the Ombotsman to provide transparency on the accuracy of our various daily fantasy baseball ‘bots (Streamonator, Hittertron, DFSBot).

The testing is done on every day’s worth of data for Streamonator, Hittertron, and DFSBot. The actual player stats/DFS points/actual $ values for every starting player are compared to our projections via correlation testing. Two data sets can have a correlation percentage between 100 and -100%. 100% would mean that they are perfectly correlated. This could mean they equal exactly or that a single formula can take the first as an input to calculate the second point (easy example, Celsuis and Fahrenheit temperatures are 100% correlated but obviously are not equal). 0% means completely random. Negative correlations would indicate an inverse relationship – e.g., strikeouts and batting average (needless to say, if projections for a stat are negatively correlated with the actual results, the projections are worthless).

The tests are summarized at the month but also broken out by the day. When viewed by the day, you will see a lot of volatility because of the smaller sample size. All it takes is for Strasburg to give up 8 ER to the Astros or a 1-0 game at Coors to put a significant dent into a day’s projection accuracy. The monthly averages provide a better, less volatile gauge on projection accuracy. The accuracy tests include:

DFSBot Hitter and Pitcher Testing – This is the clearest test IMO. I correlate my projected salaries/DFS points against actual DFS points (for hitters, it just includes those who were in starting lineups) and then do the same based on their DFS salaries for that day.
- Key takeaway #1: Our projected salaries/DFS points consistently correlate better with actual results for both hitters and pitchers than the salaries of all three DFS services covered in DFSBot. These are companies valued in the millions of $ and are making efforts to adjust player values based on opponent, park, recent performance, etc. This is no small feat in my opinion.
- Key takeaway #2: It appears that FanDuel’s point structure makes it a little harder to project than DraftKings and DraftStreet. Both my results and FanDuel’s results are slightly lower than for the other two services. This means bupkis to me but might mean something to a hard-core DFS player.
- Key takeaway #3: Pitcher results are more predictable than hitter results. (Now, whether this means one should gamble that much more on non-aces who project as nice values vs reliably awesome aces is a different topic)

Streamonator Testing – There are two types of tests done for Streamonator. The first is a correlation test against various statistics (W, L, IP, H, ER, K, BB, ERA, WHIP) as well as the estimated $ value of the start. The second is a distribution that shows the average results of starts based on various projected $ ranges (e.g., $0 – $3.50) as well as how often starts fall into the various ranges.

Key takeaway #1: There are large differences in the correlations by pitcher stat. For June, K projections correlated at 43% while Wins/Losses correlated only at 9.2% and 6.9% respectively. ERA/WHIP fell in between at 20-25%. While this produced an initial idea of ‘weighting each category for Streamonator $ calculations, the reality is that this already happens to an extent. The low correlation on wins/losses is tied to the fact that my W/L projection model is very conservative so most pitchers gameday Win/Loss estimates are bunched together where it will show more differentiation for K’s. (The model uses projected ERA for the starter and the opponent. You’d think that would be a strong predictor of winning % but, as this testing shows, it is weak. And I think any other variable – e.g., bullpen strength, IP per SP, etc. – is even weaker)
Key takeaway #2: Now we can see proof that Streamonator dollar values provide reliable results in the long run. How? If you look at the Streamonator Distribution table, you will see that, on average, the Streamonator projections by range match up very well with the average return per $ range. For example, for the 30 days ending with July 6th, there were 95 starts in the past 30 days estimated as worth $10.5 to $14. The average estimated value of these is $12.1. The average value of those starts came out to $12.0. Below is the full snapshot for those 30 days. The SON $ Average and the Actual Average $ columns correlate at 96% (r^2 of 91%)

Stream-o-Nator Actual $ Averages by Projected $ Range (June 7-July 6)
	Count	Stream-o-nator AVG $	Actual Average $
<-$7	59	-11.9	-12.6
$-7 to $0	139	-3	-0.8
$0 to $3.5	111	1.7	3.1
$3.5 to $7	131	5.1	3.5
$7 to $10.5	124	8.7	10.6
$10.5 to $14	98	12.1	13.5
$14 to $17.5	65	15.3	17.8
$17.5 to $21	41	18.8	21.5
$21 to $28	30	24	27.8
$28+	11	31.2	54.9

Key takeway #3: All pitchers have great/bad days but, the better the projected $, the greater the chance they have a great day ($28+) and the lesser the chance they have a bad day (below -$7).

Hittertron Testing – There are two types of tests done for Hittertron. The first is a correlation test done against the following statistics: PA, AB, H, R, HR, RBI, SB, BB, SO, AVG, OBP and SLG. The second is a distribution that shows the average results for hitters based on various projected $ ranges (e.g., $0 – $3.50) as well as how often hitter days fall into various point ranges.
- Key takeaway #1: Strikeouts are the easiest stat to project with most other stats having similar correlations. (Stolen bases are misleading as so many player have 0. You can project zero SB every day for Miggy Cabrera and be right 99% of the time).
- Key takeaway #2: There is proof that Hittertron provides accurate results in the long run. The Hittertron distribution grid shows that the average results (measured in points) match up very well with each projected $ range. Aside from two slight differences, the actual average hitter points increase for every range. The correlation between the averages per range for Hittertron $ and the Actual Points comes in at 98% (r^2 of 96%).

Hittertron Actual Point* Averages by Projected $ Range(June 9-July 8)
$Range	Count	HON_AVG$	ACT_AVG_PTS
<-$7	683	-11.4	1.56
$-7 to $0	1099	-3.4	2.63
$0 to $3.5	731	1.8	4.14
$3.5 to $7	729	5.2	4.28
$7 to $10.5	646	8.6	4.46
$10.5 to $14	594	12.1	5.06
$14 to $17.5	480	15.6	4.93
$17.5 to $21	351	19	6.17
$21 to $28	450	24	6.94
$28 to $35	227	31.1	8.83
$35+	194	44.4	7.98
* Points calculated as 10(HR+SB+(R+RBI)/3+H-(.265AB))

Key takeaway #3: As one would expect, hitters are not as reliable on a day-by-day basis as starting pitchers. Even for the highest projection range ($35+), only 20% fall into the top 3 hitter points buckets while 19% fall under the worst bucket (think 0-for-4). When streaming hitters, it is imperative to take the ‘long view’ vs. the ‘short view’ as even the best hitter matchups are going to deliver goose eggs more often than those multi-Hit/R/RBI days with a HR or SB.

While we would love to directly compare our accuracy results vs other sites, that is impossible for several reasons ranging from “Almost every other site who projects daily data charge a subscription” and “This type of automated testing requires a data feed vs manual data pulls”. So we cannot state that we are the most accurate. But we do feel comfortable in stating that, with the release of the Ombotsman, we are the most transparent of all daily fantasy baseball projection services.

Please feel free to suggest additional tests – though it may take me a while to implement the good ideas (I’ll try to gently swat away your idea if it’s bad/redundant).