Don't be shellfish...Share on FacebookTweet about this on TwitterShare on Google+

This is part of a two-part series designed to help Fantasy Baseball fans determine on what fantasy rankings and projections to rely.  The first part covered Rankings.  The second part will cover Projections.  The methodology for the test relies on comparing Razzball Commenter League team drafts (576 teams in 2012 across 48 12-Team MLB leagues using ESPN’s default 5×5 format) and their end of season point totals.  Background on the methodology can be found here.

I realize many of you who read this site are not stat geeks or Excel wonks (Geeks and a bit wonky, yes).  That’s fine because, trust me, fantasy baseball does not get more fun after investing countless hours trying to gain a 5-10% edge on your competition.  But if you use pre-season projections for any part of your pre-draft work, this post will help you choose the right source(s).  Or, you could read my previous post that reviewed the best sources for fantasy baseball rankings and take solace in knowing all this analysis goes into my pre-season Point Share rankings/$ estimates.

Of course, for those of you who are stat geeks or Excel wonks, compiling and analyzing pre-season projections is a compulsion.  There are plenty of projections systems to fuel this compulsion – free systems like Steamer, ZiPS, FanGraphs Fan Projections, Marcel and CAIRO as well as paid services like Baseball Prospectus (PECOTA), Fantistics, Oliver (as part of HardballTimes.com), and Rotochamp that offer add-ons like customized dollar values for your league format.  This analysis should make your 2013 decisions a little easier.

Introduction

Analyzing pre-season baseball projections is relatively easy if all you care about is actual baseball.  You can convert all the various hitting stats into a single stat like Offensive WAR or wOBA (weighted on-base average) and the pitching stats into an ERA variation (SIERA, FIP, xFIP, etc) and then use various statistical metrics (depending on your success criteria) to compare actual vs. projected stats to determine which system is best.

Analyzing projections through the prism of fantasy baseball adds some additional wrinkles.  It is possible to convert hitter/pitcher stat lines into a single metric (e.g., auction dollars, point shares) but it is more difficult to calculate.  In addition, the impact of a correct/incorrect projection can vary wildly in a way that is tough to quantify – e.g., was it a top pick vs. lower round pick?  how quickly was one able to drop an underperformer and what was I able to get as a free agent replacement?  do starting pitching injuries hurt less because I can stream?, etc.

The test I have constructed looks to address all these issues/challenges.  It works as such:

  1. Take the draft results by team from the 48 Razzball Commenter Leagues (RCLs) in 2012.  These were hosted on ESPN using ESPN’s default league formats for 12 team 5×5 MLB leagues.  This amounts to 576 teams’ worth of draft data.
  2. Convert every drafted player’s projection (including the three bench players) into $ values using Razzball’s Point Share methodology (note: if a player was not drafted, they are not included in this study.)
  3. Total these player $ values for each of the 576 RCL teams.
  4. See how these team totals correlate with each team’s final Total Standings Points

The success ‘ceiling’ is based on the correlation of teams’ total points vs. the estimated $ value of their team based on actual stats.  This came out to a 57.7% correlation for hitting, 49.4% for pitching, and 51.9% for pitching if you exclude Save points.  In other words, about 58% of a team’s combined R/HR/RBI/SB/AVG standings points can be explained based on the end of season stats of its drafted hitters.  The other 42% would be based on all other factors including FA/Waiver pickups, trades, league strength, etc.

Projected Sources Tested (in alphabetical order)

  • Baseball Prospectus (PECOTA) – early/mid March (has Madson as Reds closer and injury was announced March 28th)
  • CAIRO – last updated on April 3rd
  • ESPN – assumed late March/early April since it has Ryan Madson at 0 IP
  • FanGraphs Fans – updated through late March (fan projections likely throughout Feb/March)
  • FantasyPros.com – late March/early April (they aggregate sources so probably a mix of Feb-early April)
  • Fantistics – late March/early April (had Marshall as closer vs. Madson)
  • Grey (Razzball) - February (note:  only in Test #1)
  • Marcel – not sure on reporting date.  Methodology is based solely on previous season stats so does not really matter.
  • Oliver/The Hardball Times Forecasts – late March
  • Rotochamp – early/mid-March (has Madson as closer)
  • Steamer – last updated on March 21st
  • ZiPS – last updated on April 3rd

Notes

  • The $ floor per player was set at -$5 as this led to higher correlations in the test vs. using  $0.
  • The followings sources did not have projections for Yoenis Cespedes and Yu Darvish:  ZiPS, Marcel, Rotochamp, FanGraphs Fans.  These sources received a $ value derived from an average of various sources.  Any other players without projections received the floor value of -$5.
  • The following sources that did not project Saves (or I did not get those projections):  ZiPS, Marcel, Oliver.  For these sources, I credited the Save projections from FantasyPros.com (which aggregates a number of different sources).
  • Joakim Soria and Ryan Madson were given a value of $0 in all sources as some reflected their injury, others did not.
  • Sean Marshall, Aroldis Chapman, Greg Holland, and Jonathon Broxton were given 0 saves in all sources as some reflected the Madson/Soria injuries while others did not.

Test Overview

Below are the three tests that were performed.  I consider projecting performance (e.g., a player  is projected at 1 HR per 20 plate appearances, a pitcher is projected at 8 Ks per 9 innings, etc.) and playing time (e.g., Plate Appearances for hitters, IP for pitchers) as two distinct skills.

Test # Title What is it testing?
1 Full Test (Source PA/IP and Performance Rates) Performance + Playing Time Projections
2 Performance Test (Actual PA/IP With Source’s Performance Rates) Performance Projections
3 Playing Time Test (Source’s PA/IP With Actual Performance Rates) Playing Time Projections

The first test just uses the source’s projections – in effect, testing both performance and playing time.

The 2nd and 3rd tests isolate performance and playing time respectively by replacing the other variable with the actual results.  Here is an example with Mike Trout:

  • Steamer projected Mike Trout to hit a HR roughly one every 40 plate appearances and to have 390 Plate Appearances.
  • Mike Trout actually hit a HR roughly once every 20 plate appearances and had 632 plate appearances.
  • For the performance test, Trout’s actual PA (632) are multiplied by Steamer’s HR rate of ~1 every 40 PAs and nets 15 HRs.
  • For the playing time test, Steamer’s projected 390 PAs are multiplied by his actual HR rate of ~1 every 20 PAs and nets 19 HRs.
  • This is done for his other hitting stats and then a $ figure is estimated for him and credited to every RCL team that drafted Trout.  (In this specific instance, since Trout hit 30 HRs, you could say that Steamer’s playing time projection was better than its performance projection).

Since Tests #2 and #3 replace one of the variables with Actual stats, the correlations for the 2nd and 3rd tests are naturally higher than that of the 1st test.

While I recommend using the ‘best’ source across the four ‘skills’ – hitting performance, hitting playing time, pitcher performance, pitcher playing time – test #1 is sufficient for all of you who just want a single projection source.

Final note – since Saves are such a fluky category, I show the Pitching tests both with and without Saves points.  I have it ordered based on the ‘without Saves points’ since I think projecting Saves is more art than science.

Test #1 – Full Test (Source’s Plate Appearances/Innings Pitched and Performance Rates) 

Correlation of Projected RCL Team Value To Final Season Hitting Points
Source Correlation To Total Team Hit Pts
FantasyPros.com 23.2%
Baseball Prospectus (PECOTA) 21.5%
FanGraphs Fans 19.8%
ESPN 19.4%
Steamer 18.1%
Oliver 16.3%
CAIRO 16.1%
Fantistics 14.5%
Rotochamp 14.2%
Grey (Razzball) 13.8%
Marcel 12.8%
ZiPS 11.5%

 

Correlation of Projected RCL Team Value To Final Season Pitching Points
Source Total Team Pitch Pts Total Team Pitch Points (w/o Saves)
Steamer 15.3% 16.6%
Fantistics 8.9% 12.7%
FantasyPros.com 4.9% 8.7%
Grey (Razzball) 4.6% 8.6%
Baseball Prospectus (PECOTA) 5.5% 8.0%
Rotochamp 1.9% 6.6%
ESPN 3.8% 6.5%
CAIRO 3.1% 6.2%
Oliver 2.3% 5.1%
FanGraphs Fans 1.0% 5.0%
Marcel 1.9% 4.8%
ZIPS 1.2% 3.0%

Notes/Findings:

  • All sources’ projections correlated better with Team Hitting points vs. Pitching points.  As Test #2 will illustrate, hitter performance projections are more accurate than pitcher performance projections.  (Test #3 illustrates that playing time differences between hitters and pitchers – caused by injuries or poor performance – are negligible.)
  • While FantasyPros.com finished atop the Hitting leaderboard, it is the equivalent of a slow hitter leading the league in AVG with a .400 BABIP as Tests #2 and #3 show that FantasyPros’ performance projections are near the bottom and its playing time projections are middle of the pack.
  • Based on Tests #2 and #3, I would say that BaseballProspectus, Steamer, and FanGraphs Fan Projections are the top single source choices as they were the only three to finish in the top half of both tests.  One caveat on FanGraphs Fan Projections, however, is that the player pool is shallow and is barely sufficient by the end of March.  If you are drafting in the first half of March, you will definitely need a second source to fill in the gaps (or just use Steamer or BP)
  • For pitching, Steamer and Fantistics are the clear choices if you are looking for a single source of projections.  They are in the top 3 of all 3 tests – regardless of whether Saves points are factored in or not.
  • Grey’s projections did pretty well – particularly in pitching – given he does his projections in February and does not rely on any formulas/calculations.
  • The performance of Marcel, ZiPS, and CAIRO is misleading as none of the three aim to accurately project playing time.  That said, if you have no plans to adjust their projections based on a solid playing time source, they will be of little value.

Test #2 – Performance Test (Actual Plate Appearances/Innings Pitched With Source’s Performance Rates) 

Correlation of Projected RCL Team Value To Final Season Hitting Points – Hitting Rates Only (Using Actual Plate Appearances/Innings Pitched)
Source Correlation To Total Team Hit Pts
Actual Stats 57.7%
Baseball Prospectus (PECOTA) 42.2%
Oliver 42.0%
CAIRO 42.0%
Steamer 41.7%
FanGraphs Fans 41.7%
ZiPS 41.5%
Marcel 41.4%
ESPN 41.2%
FantasyPros.com 40.4%
Rotochamp 39.7%
Fantistics 39.6%

 

Correlation of Projected RCL Team Value To Final Season Pitching Points – Pitching Rates Only (Using Actual Plate Appearances/Innings Pitched)
Source Total Team Pitch Pts Total Team Pitch Points (w/o Saves)
Actual Stats 49.4% 51.9%
Fantistics 22.3% 24.3%
Steamer 24.0% 23.9%
CAIRO 21.9% 23.4%
FantasyPros.com 20.9% 23.0%
Rotochamp 19.0% 22.1%
ESPN 20.1% 20.8%
Marcel 17.5% 20.0%
Oliver 18.3% 20.0%
Baseball Prospectus (PECOTA) 19.0% 20.0%
ZiPS 18.7% 19.7%
FanGraphs Fans 18.7% 19.5%

Notes/Findings

  • The hitting correlations are much closer than found in Test #1.  I think this suggest a parity of sorts with the projection of hitter performance rates.  The sources are so close to each other that it is hard to say there is a single victor.
  • For pitching, there is a little more separation between the sources.  Fantistics and Steamer are #1/#2 depending on whether you count Saves or not.  CAIRO does much better here vs. Test #1 as its main focus is on projecting performance vs. playing time.
  • Steamer and CAIRO are the only two sources to be in the top 4 of both hitting and pitching.  Two of the top 4 in hitting are in the bottom 4 for pitching and two of the top 4 in pitching are in the bottom 4 for hitting.  There is actually a negative correlation (-50%) between the hitting and pitching correlation %s of each service!
  • Grey is not part of this test (or Test #3) as he doesn’t provide Plate Appearances or Innings Pitched and, thus, I could not convert his projections into rate stats (e.g., HR/PA).

Test #3 – Playing Time Test (Source’s Plate Appearances/Innings Pitched With Actual Performance Rates ) 

Correlation of Projected RCL Team Value To Final Season Hitting Points – Plate Appearance Projections Only (Using Actual Hitting Rates)
Source Correlation To Total Team Hit Pts
Actual 57.7%
Steamer 52.8%
Baseball Prospectus (PECOTA) 52.6%
FanGraphs Fans 52.3%
Fantistics 51.6%
FantasyPros.com 50.4%
ESPN 49.6%
Rotochamp 48.8%
Oliver 47.9%
ZIPS 45.9%
Marcel 42.2%
CAIRO 38.3%

 

Correlation of Projected RCL Team Value To Final Season Pitching Points – IP Projections Only (Using Actual Pitching Rates + Save Projections)
Source Total Team Pitch Pts Total Team Pitch Points (w/o Saves)
Actual 49.4% 51.9%
Fantistics 48.4% 51.6%
Baseball Prospectus (PECOTA) 47.9% 51.1%
Steamer 46.5% 49.9%
ESPN 45.2% 48.8%
FanGraphs Fans 45.2% 48.8%
FantasyPros.com 43.9% 47.5%
Rotochamp 43.7% 47.5%
ZIPS 43.3% 46.5%
Marcel 43.0% 45.8%
Oliver 42.2% 45.8%
CAIRO 38.1% 42.4%

Notes/Findings

  • Baseball Prospectus, Steamer, and Fantistics finished in the top 4 for both hitting and pitching playing time.  While this is not surprising for BP and Fantistics given they are paid services, I found it very surprising (and impressive) that Steamer performed so well.  It looks like Steamer’s efforts to analyze and concentrate on playing time is paying off.
  • The higher correlations seen in this test vs. Test #2 indicate that differences in projected vs. actual performance are larger than the differences seen in playing time projections (including injuries).  If some genie ever offers you pre-season access to actual playing time or performance rates for 12 team 5×5, definitely take the performance rates.  This might not be the case in AL/NL-only leagues where there is less depth in free agency.

Final Note For Obsessives

The above tests suggest using different sources for hitting performance, hitting playing time, pitching performance, and pitching playing time.  For every 9 of you who think that’s insane, there is one of you who is thinking something like, “Only one source for hitting performance?  That’s crazy.  I average together 3 hitting sources and 3 pitching sources!”  I hear you.  I historically average together 2-3 sources for performance rates and started averaging 2 playing time sources starting this past year.

Theoretically, I can determine the best combination of sources for 2012 through a regression test.  For instance, the ideal formula for combining 2012 Steamer + ZiPS hitter projections (based on comparing actual vs. projected RCL team values) is:  8.284098+.377763*Steamer + .642084*ZiPS.  That results in a 42.6% correlation whereas using Steamer alone was at 41.7% and ZiPS was at 41.5% (see Test #2).  You could go even further and determine the best formula for every stat category and…

Yeah, that’s a crazy person path that I am not going down.  Not only is it a case of diminishing gains but it is highly doubtful that the regression formula for 2012 will magically work in 2013.

So here are the results from a simpler test that just averages two top sources for Tests #2 and #3 to see if there are any benefits to averaging two sources:

  • Hitting Performance:  CAIRO – 42%, Steamer – 41.7%, CAIRO/Steamer – 42.5%
  • Hitting Playing Time:  Baseball Prospectus – 52.6%, Fantistics 51.6%, BP/Fantistics 53.4%
  • Pitching Performance (ignoring Save pts) – CAIRO – 23.4%, Steamer 23.9%, CAIRO/Steamer 24.6%
  • Pitching Playing Time (ignoring Save pts) – Baseball Prospectus – 51.1%, Fantistics – 51.1%, BP/Fantistics 51.6%

In all four cases, the averaged estimate increased about a percentage point.  So there does appear to be some benefit to averaging two sources – as long as those sources both performed well.  For instance, in 2012, I used 67% Steamer, 33% ZiPS for my Point Shares pitching performance and ended up with a 51.8% correlation whereas just using Steamer would’ve netted 52.8% (ZiPS was at 45.9%).

I did not do any tests on averaging 3+ systems but I would guess that it would just as likely hurt accuracy as it would help accuracy.

From Around The Web

  1. Vacation says:
    (link)

    Jeebus, well done Rudy!

  2. For the Obsessives says:
    (link)

    I’ve crowdsourced my projections for several years, giving equal weight to 14 sources including Steamer, FP911, Zips, PECOTA, Fangraphs, Mastersball, et. al. It’s a massive excercise each year. Are you saying it would be as accurate to use 3-4 or that perhaps I should give greater weight to one source or the other (depending on H or P)?

    • Wow, that’s like a high school girl ensuring she’s going to have an athletic baby by sleeping with 2/3 of the starting varsity football team.

      First question, are you just averaging the totals (e.g., 20 HRs + 25 HRs + etc. / # of sources) or are you converting them into rates and applying vs a playing time projection (e.g., 0.05 HR/PA * 500 PAs)?

      Assuming you are just averaging the projections, I’d suggest re-channeling your efforts towards finding 2 performance sources and 2 playing time sources that you like/trust. Given the minor lift in correlation from averaging 2 sources, I think any more than that is overkill (and more likely to hurt than help). I think a straight averages vs. weighting each source is the easiest way to go – I’d only use weights if it was early in the pre-season and you’re forced to use a less accurate projection source.

      I think the best proxy for your crowdsourced method is the FantasyPros projections which performed quite well in Test #1. While I think a composite method (2 performance/2 playing time) should beat it (it did in pitching in 2012 but FantasyPro’s hitting projections were the best), I’d probably just use FantasyPros if the alternative was averaging 14 separate sources.

      One other note – I’d also differentiate the efforts behind hitting and pitching. I think crowdsourcing fares better for hitting (where the performance between systems was close) vs. pitching (where Steamer and, to a lesser extent, Fantistics, had a clear advantage).

      Hope that helps…

  3. For the Obsessives says:
    (link)

    Got it … no more sleeping around to ensure results. Check!

    My formulas are a combination. I avg to get rates for pitchers (H/9, BB/9, K/9), but most stats are straight averages (W,Sv, IP/ERA and AB, R, HR, RBI, SB, Avg, OBP). Should I be utilizing that method on more of those stats?

    As for the composite method (2 performance/2 playing time), I’ve always given equal weighting. Recommend a couple of each?

    • Based on these results and looking at some 2011 tests (http://www.fangraphs.com/blogs/index.php/testing-projections-for-2011/), I’m leaning towards Steamer/CAIRO for both hitting and pitching rates next year. I like that Steamer and CAIRO release in February and then make some adjustments later in the pre-season. I’m still deciding on playing time – though I’ve been satisfied over the past 2 years using Fantistics (debating whether it pays to pony up for BP).

      AVG/OBP are already rates so no reason to touch those. The easiest way to treat the other stats like rates might be to do the averaging of totals and then multiply those #s based on the PLAYING TIME SOURCED AB/IP / AVERAGED AB/IP. So if your average ends up with 30 HRs in 600 ABs and the better playing time sources say 500 ABS, you multiply 30 * 500/600 and it’s converted to 25 HRs. The only stat I wouldn’t treat as a rate stat is Saves.

  4. DaBulls says:
    (link)

    Thanks. Any idea what this looks like for previous years, and what the variability looks like from year to year? I know it’s unfair because some of the sources (like Steamer) change their methodologies, but this would still get at whether there’s something systematically better about Steamer, BP and Fantistics, or whether this was just a good year for them.

    • @DaBulls, Great question. I initially was going to do this as a 2 year test but it ended up being too much work (and my 2011 projections weren’t as complete).

      Based on this test (http://www.fangraphs.com/blogs/index.php/testing-projections-for-2011/), I think Steamer and CAIRO’s pitching performance is no fluke. I can only guess that Fantistics’ success isn’t a fluke. BP/PECOTA’s performance in hitting/pitching in 2012 vs. 2011 seems fairly close as well (as in, I’d consider it for hitting in 2013, not for pitching).

      As for hitting performance (Test #2), the results are so close that I could imagine the order being very different for 2011. My decision on sources for 2012 will likely come down to other factors like: 1) delivery time (Feb vs. March) and 2) completeness (FG Projections barely good enough for 12-team).

  5. For the Obsessives says:
    (link)

    My initial thought is to average in 2012 actuals for rate stats (HR/AB) but quickly realized can’t use them for playing time (e.g., Manny Machado = only 202 PA, 191 AB). Would you even consider them for rate stats if you could exclude small sample sizes like Machado? Say … anything over X at bats?

    • Marcel (the baseline of projection services) factors in 3 years of past stats weighted so that the most recent year receives the highest weight.

      In addition, most systems like Steamer, ZIPS, and BP convert minor league stats into MLE (major league equivalents).

      So, as long as you use the right projection sources, the 2012 stats have already been factored into your results.

  6. beardcrabs says:
    (link)

    your posts hurt my head (I believe in a good way)… I usually submit, look at the tables, and read the conclusion… But your work is a much appreciated circle change…

    • thanks. i tried to write it in a way that made it clear how to read the results – even if the statistical part was confusing.

  7. #2: If you average all the projection systems, you are likely to suffer from massive overfitting.

    Great blog post!

  8. Fantasy Cheapskate says:
    (link)

    Being a cheapskate trying to decide which playing time sources should I use for playing time. From the tables it looks like Streamer/Fans for plate appearance and Steamer/ESPN for Pitching?

    Am I reading that right?

    • Fantasy Cheapskate says:
      (link)

      @Fantasy Cheapskate, sorry that should be FREE sources.

      • @Fantasy Cheapskate, I’d rely on Steamer and perhaps just use FanGraphs and/or ESPN to identify hitters/pitchers with abnormally large differences. (very helpful to identify a starter projected incorrectly as a reliever, etc.).

  9. Great work Rudy. Of course, I would like to see Mastersball.com’s projections included next time to see how we rate (after all I did win the experts league using them). Let me know and this time next year and I will shoot you a set. It is worth noting that we had the earliest set of projections on the market last year (out before Christmas).

    • Ryan –
      Thanks. Send me the 2013 Mastersball projections before the season starts and I’ll include them in next year’s analysis.

Comments are closed.