Review of Baseball Projections

This is part of a two-part series designed to help Fantasy Baseball fans determine on what fantasy rankings and projections to rely. The first part covered Rankings. The second part will cover Projections. The methodology for the test relies on comparing Razzball Commenter League team drafts (576 teams in 2012 across 48 12-Team MLB leagues using ESPN’s default 5×5 format) and their end of season point totals. Background on the methodology can be found here.

I realize many of you who read this site are not stat geeks or Excel wonks (Geeks and a bit wonky, yes). That’s fine because, trust me, fantasy baseball does not get more fun after investing countless hours trying to gain a 5-10% edge on your competition. But if you use pre-season projections for any part of your pre-draft work, this post will help you choose the right source(s). Or, you could read my previous post that reviewed the best sources for fantasy baseball rankings and take solace in knowing all this analysis goes into my pre-season Point Share rankings/$ estimates.

Of course, for those of you who are stat geeks or Excel wonks, compiling and analyzing pre-season projections is a compulsion. There are plenty of projections systems to fuel this compulsion – free systems like Steamer, ZiPS, FanGraphs Fan Projections, Marcel and CAIRO as well as paid services like Baseball Prospectus (PECOTA), Fantistics, Oliver (as part of HardballTimes.com), and Rotochamp that offer add-ons like customized dollar values for your league format. This analysis should make your 2013 decisions a little easier.

Introduction

Analyzing pre-season baseball projections is relatively easy if all you care about is actual baseball. You can convert all the various hitting stats into a single stat like Offensive WAR or wOBA (weighted on-base average) and the pitching stats into an ERA variation (SIERA, FIP, xFIP, etc) and then use various statistical metrics (depending on your success criteria) to compare actual vs. projected stats to determine which system is best.

Analyzing projections through the prism of fantasy baseball adds some additional wrinkles. It is possible to convert hitter/pitcher stat lines into a single metric (e.g., auction dollars, point shares) but it is more difficult to calculate. In addition, the impact of a correct/incorrect projection can vary wildly in a way that is tough to quantify – e.g., was it a top pick vs. lower round pick? how quickly was one able to drop an underperformer and what was I able to get as a free agent replacement? do starting pitching injuries hurt less because I can stream?, etc.

The test I have constructed looks to address all these issues/challenges. It works as such:

Take the draft results by team from the 48 Razzball Commenter Leagues (RCLs) in 2012. These were hosted on ESPN using ESPN’s default league formats for 12 team 5×5 MLB leagues. This amounts to 576 teams’ worth of draft data.
Convert every drafted player’s projection (including the three bench players) into $ values using Razzball’s Point Share methodology (note: if a player was not drafted, they are not included in this study.)
Total these player $ values for each of the 576 RCL teams.
See how these team totals correlate with each team’s final Total Standings Points

The success ‘ceiling’ is based on the correlation of teams’ total points vs. the estimated $ value of their team based on actual stats. This came out to a 57.7% correlation for hitting, 49.4% for pitching, and 51.9% for pitching if you exclude Save points. In other words, about 58% of a team’s combined R/HR/RBI/SB/AVG standings points can be explained based on the end of season stats of its drafted hitters. The other 42% would be based on all other factors including FA/Waiver pickups, trades, league strength, etc.

Projected Sources Tested (in alphabetical order)

Baseball Prospectus (PECOTA) – early/mid March (has Madson as Reds closer and injury was announced March 28th)
CAIRO – last updated on April 3rd
ESPN – assumed late March/early April since it has Ryan Madson at 0 IP
FanGraphs Fans – updated through late March (fan projections likely throughout Feb/March)
FantasyPros.com – late March/early April (they aggregate sources so probably a mix of Feb-early April)
Fantistics – late March/early April (had Marshall as closer vs. Madson)
Grey (Razzball) – February (note: only in Test #1)
Marcel – not sure on reporting date. Methodology is based solely on previous season stats so does not really matter.
Oliver/The Hardball Times Forecasts – late March
Rotochamp – early/mid-March (has Madson as closer)
Steamer – last updated on March 21st
ZiPS – last updated on April 3rd

Notes

The $ floor per player was set at -$5 as this led to higher correlations in the test vs. using $0.
The followings sources did not have projections for Yoenis Cespedes and Yu Darvish: ZiPS, Marcel, Rotochamp, FanGraphs Fans. These sources received a $ value derived from an average of various sources. Any other players without projections received the floor value of -$5.
The following sources that did not project Saves (or I did not get those projections): ZiPS, Marcel, Oliver. For these sources, I credited the Save projections from FantasyPros.com (which aggregates a number of different sources).
Joakim Soria and Ryan Madson were given a value of $0 in all sources as some reflected their injury, others did not.
Sean Marshall, Aroldis Chapman, Greg Holland, and Jonathon Broxton were given 0 saves in all sources as some reflected the Madson/Soria injuries while others did not.

Test Overview

Below are the three tests that were performed. I consider projecting performance (e.g., a player is projected at 1 HR per 20 plate appearances, a pitcher is projected at 8 Ks per 9 innings, etc.) and playing time (e.g., Plate Appearances for hitters, IP for pitchers) as two distinct skills.

Test #	Title	What is it testing?
1	Full Test (Source PA/IP and Performance Rates)	Performance + Playing Time Projections
2	Performance Test (Actual PA/IP With Source’s Performance Rates)	Performance Projections
3	Playing Time Test (Source’s PA/IP With Actual Performance Rates)	Playing Time Projections

The first test just uses the source’s projections – in effect, testing both performance and playing time.

The 2nd and 3rd tests isolate performance and playing time respectively by replacing the other variable with the actual results. Here is an example with Mike Trout:

Steamer projected Mike Trout to hit a HR roughly one every 40 plate appearances and to have 390 Plate Appearances.
Mike Trout actually hit a HR roughly once every 20 plate appearances and had 632 plate appearances.
For the performance test, Trout’s actual PA (632) are multiplied by Steamer’s HR rate of ~1 every 40 PAs and nets 15 HRs.
For the playing time test, Steamer’s projected 390 PAs are multiplied by his actual HR rate of ~1 every 20 PAs and nets 19 HRs.
This is done for his other hitting stats and then a $ figure is estimated for him and credited to every RCL team that drafted Trout. (In this specific instance, since Trout hit 30 HRs, you could say that Steamer’s playing time projection was better than its performance projection).

Since Tests #2 and #3 replace one of the variables with Actual stats, the correlations for the 2nd and 3rd tests are naturally higher than that of the 1st test.

While I recommend using the ‘best’ source across the four ‘skills’ – hitting performance, hitting playing time, pitcher performance, pitcher playing time – test #1 is sufficient for all of you who just want a single projection source.

Final note – since Saves are such a fluky category, I show the Pitching tests both with and without Saves points. I have it ordered based on the ‘without Saves points’ since I think projecting Saves is more art than science.

Test #1 – Full Test (Source’s Plate Appearances/Innings Pitched and Performance Rates)

Correlation of Projected RCL Team Value To Final Season Hitting Points
Source	Correlation To Total Team Hit Pts
FantasyPros.com	23.2%
Baseball Prospectus (PECOTA)	21.5%
FanGraphs Fans	19.8%
ESPN	19.4%
Steamer	18.1%
Oliver	16.3%
CAIRO	16.1%
Fantistics	14.5%
Rotochamp	14.2%
Grey (Razzball)	13.8%
Marcel	12.8%
ZiPS	11.5%

Correlation of Projected RCL Team Value To Final Season Pitching Points
Source	Total Team Pitch Pts	Total Team Pitch Points (w/o Saves)
Steamer	15.3%	16.6%
Fantistics	8.9%	12.7%
FantasyPros.com	4.9%	8.7%
Grey (Razzball)	4.6%	8.6%
Baseball Prospectus (PECOTA)	5.5%	8.0%
Rotochamp	1.9%	6.6%
ESPN	3.8%	6.5%
CAIRO	3.1%	6.2%
Oliver	2.3%	5.1%
FanGraphs Fans	1.0%	5.0%
Marcel	1.9%	4.8%
ZIPS	1.2%	3.0%

Notes/Findings:

All sources’ projections correlated better with Team Hitting points vs. Pitching points. As Test #2 will illustrate, hitter performance projections are more accurate than pitcher performance projections. (Test #3 illustrates that playing time differences between hitters and pitchers – caused by injuries or poor performance – are negligible.)
While FantasyPros.com finished atop the Hitting leaderboard, it is the equivalent of a slow hitter leading the league in AVG with a .400 BABIP as Tests #2 and #3 show that FantasyPros’ performance projections are near the bottom and its playing time projections are middle of the pack.
Based on Tests #2 and #3, I would say that BaseballProspectus, Steamer, and FanGraphs Fan Projections are the top single source choices as they were the only three to finish in the top half of both tests. One caveat on FanGraphs Fan Projections, however, is that the player pool is shallow and is barely sufficient by the end of March. If you are drafting in the first half of March, you will definitely need a second source to fill in the gaps (or just use Steamer or BP)
For pitching, Steamer and Fantistics are the clear choices if you are looking for a single source of projections. They are in the top 3 of all 3 tests – regardless of whether Saves points are factored in or not.
Grey’s projections did pretty well – particularly in pitching – given he does his projections in February and does not rely on any formulas/calculations.
The performance of Marcel, ZiPS, and CAIRO is misleading as none of the three aim to accurately project playing time. That said, if you have no plans to adjust their projections based on a solid playing time source, they will be of little value.

Test #2 – Performance Test (Actual Plate Appearances/Innings Pitched With Source’s Performance Rates)

Correlation of Projected RCL Team Value To Final Season Hitting Points – Hitting Rates Only (Using Actual Plate Appearances/Innings Pitched)
Source	Correlation To Total Team Hit Pts
Actual Stats	57.7%
Baseball Prospectus (PECOTA)	42.2%
Oliver	42.0%
CAIRO	42.0%
Steamer	41.7%
FanGraphs Fans	41.7%
ZiPS	41.5%
Marcel	41.4%
ESPN	41.2%
FantasyPros.com	40.4%
Rotochamp	39.7%
Fantistics	39.6%

Correlation of Projected RCL Team Value To Final Season Pitching Points – Pitching Rates Only (Using Actual Plate Appearances/Innings Pitched)
Source	Total Team Pitch Pts	Total Team Pitch Points (w/o Saves)
Actual Stats	49.4%	51.9%
Fantistics	22.3%	24.3%
Steamer	24.0%	23.9%
CAIRO	21.9%	23.4%
FantasyPros.com	20.9%	23.0%
Rotochamp	19.0%	22.1%
ESPN	20.1%	20.8%
Marcel	17.5%	20.0%
Oliver	18.3%	20.0%
Baseball Prospectus (PECOTA)	19.0%	20.0%
ZiPS	18.7%	19.7%
FanGraphs Fans	18.7%	19.5%

Notes/Findings

The hitting correlations are much closer than found in Test #1. I think this suggest a parity of sorts with the projection of hitter performance rates. The sources are so close to each other that it is hard to say there is a single victor.
For pitching, there is a little more separation between the sources. Fantistics and Steamer are #1/#2 depending on whether you count Saves or not. CAIRO does much better here vs. Test #1 as its main focus is on projecting performance vs. playing time.
Steamer and CAIRO are the only two sources to be in the top 4 of both hitting and pitching. Two of the top 4 in hitting are in the bottom 4 for pitching and two of the top 4 in pitching are in the bottom 4 for hitting. There is actually a negative correlation (-50%) between the hitting and pitching correlation %s of each service!
Grey is not part of this test (or Test #3) as he doesn’t provide Plate Appearances or Innings Pitched and, thus, I could not convert his projections into rate stats (e.g., HR/PA).

Test #3 – Playing Time Test (Source’s Plate Appearances/Innings Pitched With Actual Performance Rates )

Correlation of Projected RCL Team Value To Final Season Hitting Points – Plate Appearance Projections Only (Using Actual Hitting Rates)
Source	Correlation To Total Team Hit Pts
Actual	57.7%
Steamer	52.8%
Baseball Prospectus (PECOTA)	52.6%
FanGraphs Fans	52.3%
Fantistics	51.6%
FantasyPros.com	50.4%
ESPN	49.6%
Rotochamp	48.8%
Oliver	47.9%
ZIPS	45.9%
Marcel	42.2%
CAIRO	38.3%

Correlation of Projected RCL Team Value To Final Season Pitching Points – IP Projections Only (Using Actual Pitching Rates + Save Projections)
Source	Total Team Pitch Pts	Total Team Pitch Points (w/o Saves)
Actual	49.4%	51.9%
Fantistics	48.4%	51.6%
Baseball Prospectus (PECOTA)	47.9%	51.1%
Steamer	46.5%	49.9%
ESPN	45.2%	48.8%
FanGraphs Fans	45.2%	48.8%
FantasyPros.com	43.9%	47.5%
Rotochamp	43.7%	47.5%
ZIPS	43.3%	46.5%
Marcel	43.0%	45.8%
Oliver	42.2%	45.8%
CAIRO	38.1%	42.4%

Notes/Findings

Baseball Prospectus, Steamer, and Fantistics finished in the top 4 for both hitting and pitching playing time. While this is not surprising for BP and Fantistics given they are paid services, I found it very surprising (and impressive) that Steamer performed so well. It looks like Steamer’s efforts to analyze and concentrate on playing time is paying off.
The higher correlations seen in this test vs. Test #2 indicate that differences in projected vs. actual performance are larger than the differences seen in playing time projections (including injuries). If some genie ever offers you pre-season access to actual playing time or performance rates for 12 team 5×5, definitely take the performance rates. This might not be the case in AL/NL-only leagues where there is less depth in free agency.

Final Note For Obsessives

The above tests suggest using different sources for hitting performance, hitting playing time, pitching performance, and pitching playing time. For every 9 of you who think that’s insane, there is one of you who is thinking something like, “Only one source for hitting performance? That’s crazy. I average together 3 hitting sources and 3 pitching sources!” I hear you. I historically average together 2-3 sources for performance rates and started averaging 2 playing time sources starting this past year.

Theoretically, I can determine the best combination of sources for 2012 through a regression test. For instance, the ideal formula for combining 2012 Steamer + ZiPS hitter projections (based on comparing actual vs. projected RCL team values) is: 8.284098+.377763*Steamer + .642084*ZiPS. That results in a 42.6% correlation whereas using Steamer alone was at 41.7% and ZiPS was at 41.5% (see Test #2). You could go even further and determine the best formula for every stat category and…

Yeah, that’s a crazy person path that I am not going down. Not only is it a case of diminishing gains but it is highly doubtful that the regression formula for 2012 will magically work in 2013.

So here are the results from a simpler test that just averages two top sources for Tests #2 and #3 to see if there are any benefits to averaging two sources:

Hitting Performance: CAIRO – 42%, Steamer – 41.7%, CAIRO/Steamer – 42.5%
Hitting Playing Time: Baseball Prospectus – 52.6%, Fantistics 51.6%, BP/Fantistics 53.4%
Pitching Performance (ignoring Save pts) – CAIRO – 23.4%, Steamer 23.9%, CAIRO/Steamer 24.6%
Pitching Playing Time (ignoring Save pts) – Baseball Prospectus – 51.1%, Fantistics – 51.1%, BP/Fantistics 51.6%

In all four cases, the averaged estimate increased about a percentage point. So there does appear to be some benefit to averaging two sources – as long as those sources both performed well. For instance, in 2012, I used 67% Steamer, 33% ZiPS for my Point Shares pitching performance and ended up with a 51.8% correlation whereas just using Steamer would’ve netted 52.8% (ZiPS was at 45.9%).

I did not do any tests on averaging 3+ systems but I would guess that it would just as likely hurt accuracy as it would help accuracy.