Review of 2011 Fantasy Baseball Player Rankings

In a previous post, I laid out a methodology for testing fantasy baseball player rankings/auction values and all the components involved in projecting player values. I got feedback from some smart folks that didn’t ‘get’ the test. Since the common variable in that equation was me, I’m going to try explaining it one more time before I jump into the results of my test across 14 player rankings across 12 sources (2 f0r ESPN & Razzball) + the Average Draft Position (ADP) for the 456 (38 leagues of 12 teams) Razzball Commenter Leagues participants. (feel free to skip the next paragraphs if you just want to see the results).

There are three main components to developing pre-season fantasy baseball rankings:

Playing Time Estimates (PA/AB and IP)
Statistical Projections (HRs per AB, K/9, etc.)
Methodology For Converting #1 and #2 Into Player Rankings/Value

Most player rankings are published as ‘Top 200’ or ‘Top 300’ with no stat projections. I call these ‘curated’ lists. Each curator creates their list differently but I’d gather that they probably all use some source for Playing Time and Stat Projections and then use their fantasy experience to determine each player’s value (vs. an actual formula). For testing purposes, you have to test the composite effectiveness of these three variables since they aren’t explicitly listed in the rankings.

It would seem straightforward to just test each Player Ranking (or isolated component) against the end of year results. Matt Swartz of FanGraphs did a great test of Projection systems isolating wOBA and ERA. But here are the challenges that I see for such a test to have fantasy baseball relevance:

The test needs to consider all the fantasy baseball categories. If you just focus on ERA, what about Wins, WHIP and K’s? How would that help with Closers? So you would need to figure out a way to test and properly weight each statistical category.
There is a BIG difference if your pre-season ranking screw up the value of a Top 3 round player vs a late round player. So you would need to figure out a way to weight players based on their likely draft status.
If a player misses significant playing time (for a hitter, let’s say 30+ games), the timing and sequencing of these missed days as well as the availability of waiver/FA pickups plays a major role in terms of the player’s impact on a team’s success. For instance, a player like Chipper Jones who might miss games throughout the season is harder to replace than Nelson Cruz who goes on 15 day DL trips. If a player gets sent down before opening day (like Lonnie Chisenhall), it’s even easier to adjust for the loss in expected playing time. So you would need to credit replacement level performance that varies depending on the timing/sequencing of missed playing time.
The timing and sequence of a player who underperforms can dramatically change how a player impacts one’s fantasy team. Let’s say Player A and Player B had equal pre-season rankings and both underperform, hitting .220 with 15 HRs for the season. If Player A goes .150 with 0 HR in April and Player B goes .240 with 3 HR, Player A will likely be dropped quicker and that team might find a replacement player who performs above replacement value. So you would need to account for how quickly/slowly a player was replaced as well as estimate the impact of the replacement player.

Now, I can see an argument that particularly #3 and #4 come down to chance vs. a prognosticator’s skill. But that’s not entirely true. Some players are more injury-prone than others and they may be marked down accordingly (e.g., I knocked Kinsler’s playing time down about 50-75 PAs from my sources which took his value down about 10 spots in the rankings. Shouldn’t I be credited/hurt because of that choice?). And while there is some chance involved, I’d argue it’s still better to reflect the real impact these chance instances had on teams vs. find ways to remove them (like testing a rate stat like HR per AB instead of total HR).

So here is the test I put together that I think addresses these ‘near impossible to model’ variables:

Take the draft results by team from the 38 Razzball Commenter Leagues in 2011 (hosted on ESPN using ESPN’s default league formats for 12 team 5×5 MLB leagues). This amounts to 456 teams’ worth of draft data.
Create team total values based on ‘expert’ rankings/$ totals/other arbitrary metric (like Point Shares or THT Z-Score)
See how these team totals correlate with each team’s final Total Standings Points

Since we’re testing based on a team’s pre-season aggregated player value and their end of year Total Standings Points, we are factoring in all the categories (#1). The pre-season rankings provide a natural way to weight each player’s value/impact by matching them up against ranked auction dollar amounts (#2). Since the end of year Total Standings Points reflects how fantasy baseball managers behaved when facing situations involved lower than expected playing time or performance, it accounts for these two last variables (#3 and #4).

Hope that makes sense to everyone! Now on to the good stuff!

2011 Pre-Season Rankings Tested (click here for the aggregated results, links provided for rankings that are still posted):

CBSSports Top 300
ESPN – Matthew Berry Top 200
ESPN – Top 300
FantasyPros Aggregated Top 300
FoxSports Top 300
HardballTimes – Jeffrey Gross (never published)
KFFL Top 200
Last Player Picked (2011 Composite)
Razzball – Grey’s Rankings
Razzball – Rudy’s Point Shares – March 8
Razzball – Rudy’s Point Shares – late March (forgot to publish – doh!)
RCL ADP All (456 teams)
RCL ADP Top 100 finishers
RotoChamp top 300
RotoExperts Top 300
SI.com top 300
USAToday Top 200

Here are some general notes before I start throwing data out:

Most rankings do not provide $ estimates. Since there is a larger gap in value between earlier draft picks vs. later (e.g, average gap between 10th and 11th best players greater than difference in 150th vs 151st best players), I converted all rankings into dollar estimates by providing the $ estimate for the corresponding rank from my Point Shares. I used the published dollar estimates if they were available (vs forcing my $ estimates). For HardballTimes and my Point Shares, I used the player scores since they estimate the value between picks (vs. just an arbitrary 1 between picks when done as rankings).
I believe all these rankings were released between Feb 1 and March 30th with the majority in the March time frame. To be fair, I gave everyone the same values for the following players whose value dropped significantly during Spring Training – even if they had not ranked them. The values are based on my Point Share value as of late March (I put a Z-score equivalent to $6 for THT):
- Adam Wainwright – $0 – Tommy John surgery
- Chase Utley – $6 – Knee issues became more problematic
- Kendry(s) Morales – $6 – Comeback plans hit a snag
Most rankings do not specify the league format (# of teams, positions, etc). Since the Razzball Commenter League format is about as standard as it gets (12 team, 5×5, ESPN roster format), I thought it was still fair to include in this test. When possible, I used a source’s ’12 team MLB’ rankings/estimates.
Some rankings (ESPN/Matthew Berry, KFFL, USAToday) ranked only 200 players. I tested the impact of adding the next 100 players based on a composite of the other rankings and the correlation percentages decreased. I didn’t feel this was fair so I kept it as just the top 200.
Any player not in a source’s rankings who was drafted was valued at $0.
I capped (or is it floored?) any estimates at the equivalent of $0 as only some sources such as Point Shares and Last Player Picked report negative dollar values for players. This is probably for the best, anyway, as players with negative projected value before the draft are dropped like luxury good brand names in rap songs – early and often.
In the previous post, I tested whether there was a clear bias in the RCL teams’ draft behavior. The ADP of the RCL teams’ correlated closer to the ESPN Top 300 (96.7%) than either Grey’s rankings (92.6%) or my Point Shares (81.5%). While I don’t have a second non-RCL sample to confirm, I would theorize the largest bias in the RCL teams’ behavior is driven by ESPN being the draft host and providing default rankings.

Here are the test results – see the following file for the raw results including the draft picks per team and the total standings points per team:

Chart #1: Correlation % Between Team Final Standings’ Points & Team Drafted Player Projected Value By Rankings Source

Rank	Source	$ Converted	Correlation
1	Razzball – Point Shares (Late March)	N	8.0%
—	Razzball – Point Shares (Late March)	Y	7.7%
2	Razzball – Grey’s Rankings	Y	7.7%
—	Razzball Point Shares (March 8th)	N	6.8%
—	Razzball Point Shares (March 8th)	Y	6.6%
3	RotoChamp top 300	Y	0.0%
4	HardballTimes – Jeffrey Gross (Z-score)	N	-0.9%
5	RotoExperts Top 300	Y	-3.2%
6	KFFL Top 200	N	-3.5%
7	FantasyPros Aggregated Top 300	Y	-3.88%
8	USAToday Top 200	Y	-3.93%
9	RCL ADP Top 100	Y	-5.0%
10	Last Player Picked	N	-5.8%
11	RCL ADP All	Y	-6.1%
12	SI.com top 300	Y	-7.7%
13	CBSSports Top 300	N	-8.0%
14	FoxSports Top 300	N	-8.1%
15	ESPN Berry Top 200	Y	-9.3%
16	ESPN Top 300	N	-12.2%

For those that care about this sort of thing, here is a link that tests the Point Shares (Late March) against the other sources. Other that Grey’s rankings, Point Shares performed better at a 99.9+% statistical confidence level. This means that – based on these results – it would have lost to one of the other sources in less than 1 in 1000 instances.

General takeaways (all specific to the 12-team MLB 5×5 format although I imagine much of it would apply to other formats):

The vast majority of pre-season rankings have a slight negative correlation with projected team performance. Even the few that were positive aren’t very positive (8% being the high) when you consider the previous test showed that 64% of a team’s success is correlated to the actual performance of their draft picks. It’s likely there are high-stakes fantasy baseball players or subscription sites who could exceeded this 8% but, for now, it’s the ceiling. So I surmise that…
- Somewhere around 55% of fantasy baseball team performance is driven by drafted players performing above/below consensus expectations (including injuries) – or, in other words, luck. (Note: This isn’t the sum of all luck as I would think there is also luck involved around in-season pickups performing above/below consensus expectations)
- The delta between the best performing rankings and worst performing rankings (~20%) is perhaps a rough estimate of the true difference between the best and worst drafted teams prior to the season (assuming the worst drafter didn’t veer too far from ADP).
- Further proof can be seen in the nominal difference (-5.0% vs. -6.1%) between the ADP of the top 100 finishing RCL teams vs. all 456 teams. The Top 100 teams clearly drafted better judged on final season performance but it couldn’t have been predicted to any significant degree before the season started by any of these sources (and, I’d surmise, anyone)
Pre-season standings calculations are – by and large – a waste of time and energy unless it can be shown that the source’s stat + playing time expectations greatly exceeds the 8% ceiling found in this study (i.e., Having the most Point Shares $ value after the draft would increase one’s chances of winning by a negligible amount above 11-1 odds (1 in 12) in a 12-team league)
RCL ADP’s correlation for the Top 100 and All Teams is further from 0% than I expected. I have a theory later in the post on why.
The small delta (0.2-0.3%) between the $-converted and non-$ converted Point Shares calculations (March 8th and Late March) is a positive sign that no ranking source was significantly helped/hurt by the $ conversion process.
Aside from major injury news (Wainwright, Utley, K. Morales), the learnings from early March until opening day are fairly minor. Point Shares improved from 6.8% to 8.0% in correlation. This difference is statistically confident at an 81.3% level – so it’s meaningful but not overwhelming so. So for you multi-league drafters who have a late March draft, don’t feel too much pressure to update your rankings (I would scan for injuries/playing time shifts though).
Combining/averaging rankings (like FantasyPros.com does) pays some dividends but is no panacea. I use multiple sources for both my stat projections and playing time estimates based on the belief that my rankings are more likely to suffer from an outlier than benefit (case in point, if you used just Oliver like THT, you’d have Juan Francisco as a top 10 player in 2011). But if you are just combining a bunch of ‘safe’ rankings (the next section will define this further), you are not removing any risk. You are just creating one big vanilla-flavored rankings porridge. I’d rather identify a base rankings source (I’d recommend Point Shares) and then average it with a 2nd source as a sanity check
- I averaged my Late March Point Shares with Grey’s rankings and got 8.1% (vs. 8.0% for Point Shares alone). So no major gain but it didn’t hurt either.
- I averaged my Late March Point Shares with FPro’s rankings and got 2.76%. So adding in the safer rankings just dragged Point Shares down towards mediocrity.

Personal takeaways:

I’m obviously quite happy to see how (relatively) well my Point Shares did – although it’s humbling how small a percentage it explains in team performance. Here’s the humbling math – if one team in a 12-team league used +8% correlated rankings (Point Shares) and the others used -8% correlated rankings, that would increase the +8% team’s chances of winning from 8.33% (1/12th) to 9%. Perhaps that explains why Grey and I didn’t have a ton of fantasy baseball success last year despite doing so well in this test. (That and Mornoooooooooooo!)
I can’t believe Grey did so well. Given that he published them in February and his percentages beats my March 8th estimates, I’d say he’s the real (if not statistically significant) winner. It’s quite annoying because it makes this study seem rigged – all I can say is that I did not make any changes to all the sources’ rankings (other than the 3 players noted ab0ve), RCL Draft results, or final standings.

I ran two correlations to better understand the similarity (or lack thereof) between the various rankings. The first chart (Chart #2) shows how each source’s player rankings correlates against Point Shares, Grey’s Rankings, and FantasyPro’s aggregated rankings (average of 18 separate rankings). The second chart (Chart #3) shows how each source’s projected team values correlate vs those three sources.

Chart #2: Correlations of Each Source’s Player Rankings vs. Point Shares, Grey’s Rankings, and Industry Average (Sorted by Uniqueness to FPRo Aggregated Rankings)

	Rankings Correlations
Source	Point Shares (Late March)	Grey’s Rankings	FantasyPro Aggregated Rankings
HardballTimes – Jeffrey Gross (Z-score)	57%	62%	67%
Razzball – Point Shares (Late March)	100%	73%	82%
CBSSports Top 300	82%	79%	85%
KFFL Top 200	74%	71%	85%
Rotochamp – Top 300	81%	77%	86%
Last Player Picked	90%	79%	87%
Razzball – Grey Rankings	73%	100%	89%
ESPN Berry Top 200	78%	79%	89%
FoxSports Top 300	70%	82%	89%
SI.com Top 300	80%	83%	91%
ESPN Top 300	80%	85%	92%
USAToday Top 200	78%	80%	93%
RCL ADP Top 100	85%	90%	93%
RCL ADP All	84%	91%	93%
RotoExperts Top 300	83%	90%	94%
FantasyPros Aggregated Top 300	82%	89%	100%
Average	78%	81%	88%

Chart #3 Correlation of Team Projected Values By Each Rankings Sources (Sorted by Uniqueness to FPro Aggregated Rankings)

	Team Projected Value Correlations
Source	Point Shares (Late March	Grey’s Rankings	FantasyPros Aggregated Rankings
Razzball – Grey Rankings	2%	—	22%
HardballTimes – Jeffrey Gross (Z-score)	57%	-7%	37%
Razzball – Point Shares (Late March)	—	2%	59%
RotoChamp top 300	67%	-15%	61%
ESPN Berry Top 200	37%	-4%	63%
Last Player Picked	73%	-21%	64%
RCL ADP Top 100	45%	39%	66%
RCL ADP All	43%	40%	68%
CBSSports Top 300	52%	1%	70%
KFFL Top 200	49%	8%	72%
ESPN Top 300	41%	-9%	73%
FoxSports Top 300	36%	17%	74%
USAToday Top 200	49%	6%	78%
SI.com top 300	44%	16%	78%
RotoExperts Top 300	47%	31%	80%
FantasyPros Aggregated Top 300	59%	22%	—
Average	47%	8%	64%

Some observations:

While Chart #2 is a more straightforward test (e.g., compares rankings directly vs. running the results through 456 RCL teams), I think Chart #3 is the better test.
- Example #1: Grey’s rankings look close to the other systems in Chart #2 but are wildly unique when valuing RCL Teams. If Grey’s rankings were as close as Chart #2 suggests, his team values wouldn’t have been so divergent from the rest of the sources (8% correlation is really low)
- Example #2: Point Shares should correlate highest with three other automated/quant-based systems (HardballTimes, RotoChamp, Last Player Picked). This is not the case in Chart #2 but clearly the case in Chart #3.
The four most unique rankings in Chart #3 finished in the top 4 in correlating to team success while the 5th (Matthew Berry) finished 2nd to last. I don’t think this isn’t a coincidence – the most unique rankings should be furthest above/below the consensus rankings.
It is odd how ESPN’s Top 300 could finish last and yet be so safe in terms of similarity to consensus rankings. FantasyPro’s Aggregated Rankings finished middle of the pack so there appears to be some safety in consensus. Seems most likely that ESPN just had some bad luck.
The 4 quantitative-based solutions (HardballTimes, Point Shares, Rotochamp, and Last Player Picked) finish in the top 6 for uniqueness.

Two unsubstantiated theories:

I have no idea how to prove this but here’s a theory for why so many rankings are below 0%….Most of the ‘curated’ rankings reflect conventional fantasy baseball thinking (e.g., don’t draft pitchers in the first 15 picks). This conventional thinking either feeds or just mirrors what ends up as the default rankings within draft software. Weaker players lean on the default rankings more than stronger players. While success in a 12-team mixed league requires a lot of luck, there is enough skill during the draft (and in post-draft roster moves) that weaker players in aggregate will perform worse than stronger players. If weaker players depend more on default rankings/ADPs than stronger players, a negative correlation would arise between team success and any ranking system similar to default rankings. (If correct, this also means that if the RCL somehow used Grey’s rankings or my Point Shares as the default rankings, they’d automatically fare worse in a test like this.)
The common advice of “Zag when others zig” seems true for both experts who publish rankings and fantasy baseball players who seek those rankings out. If I was curating a rankings list, I’d follow Grey’s lead and take a lot of chances. (We’ll see if Grey has similar success in 2012). ‘Safe’ rankings have value only if people are drafting without default rankings (or with really bad default rankings). If I were building a system from scratch to quantify player values, I would do my best to avoid using traditional rankings as a benchmark (I’d also advise against this because there’s a lot more learning curve than you think….I’d just use Point Shares for my draft and invest that time doing something that might get you laid like learning to play guitar.)

Other Notes:

I plan on running a similar test at the end of 2012. We’ve got about 10 more RCLs than last year (38 to 48) – more sample is always better.
Since I got such a late start on this analysis, I’m going to hold off analyzing stat projections (Marcel, ZiPs, Steamer, etc.) until the offseason and have 2 years of data to reference. I”ll only be including projections that are directly available for users. Projections available for subscription will require direct permission from the publisher.
The worksheet provided in Google Docs is considered public domain – feel free to create and publish complementary or contradictory analyses with it. Please just link to this post and note the data is available courtesy of Razzball.

Final Note:

Huge thanks to our RCL Commissioner VinWins who compiled all the draft/team information and produces some great stats throughout the year such as this post summarizing the 2012 drafts.