A New Way To Test Fantasy Baseball Player Rankings

One of the biggest challenges facing the fantasy baseball fanatic is how to value and rank players. This is felt most acutely during draft season when nearly every fantasy sports site/expert has their own Top 200/300 rankings and each manager has to decide which source(s) to believe. This challenge is also felt – albeit to a lesser extent – during the season when managers are looking for a ‘player rater’ to determine trade values.

When we started Razzball a couple years back, I decided to leverage whatever limited math and Excel skills to come up with the best source of player ranking/valuation. This eventually led me to create Point Shares which estimate the difference in an average team’s points if they were to substitute a given player for the average player at his position. This differs from the Standings Gain Point (SGP) concept which is more akin to VORP/WAR-methodology and uses something close to the replacement player for valuing players. I prefer using the ‘average player’ as a benchmark vs. a ‘replacement player’ but I am not going to delve into methodology in this post (You can reference this post if this topic tickles your curiosity).

There are three major components when it comes to developing preseason rankings:

Playing Time Estimates (PA/AB and IP)
Statistical Projections (HRs per AB, K/9, etc.)
Methodology For Converting #1 and #2 Into Player Rankings/Value

I assume the majority of ‘expert’ rankings (Grey’s included) do not split out each of these three components. Instead, they leverage their past fantasy baseball experience plus playing time/statistical expectations (likely based on third-party projections and, increasingly, component stats like BABIP and FIP) and create what I would call a ‘curated’ rankings list and/or auction $ values. The initial reaction of mathy types might dismiss this for being non-scientific but, in my eyes, a well-curated player rankings beats a poorly-architected quantitative system every day of the week.

Testing curated rankings, though, is a challenge. Unlike with projection systems where each statistic can be broken out and measured (here is a recent test done on FanGraphs), one would need to design a test that used only the order in which the players were ranked (note: $ estimates are easier). A test design based on simulating draft results creates both logistical and methodological challenges . For the past three years, I have taken part in a ‘Forecaster’s Challenge‘ run by Tom Tango (co-author of The Book, creator of the Marcel projection system, and prolific blogger on InsideTheBook.com) which simulated drafts based on submitted rankings and credited victories based on total points (like in a ‘points’ league, each statistic is worth a certain amount of points). I think Tom did a great job at creating an impartial test but the conceits with any test of this type are difficult to overcome (e.g., simulated snake draft vs. actual, reducing 2B/SS/3B to one position, 20 teams vs. the standard 12 teams, inability to factor how well a team could overcome certain draft disappointments vs. others (say, Dunn vs. Hanley) through FA replacements, etc.).

Note: I believe I finished around the middle of the pack in each challenge. My dissatisfaction with some of the test conceits is unrelated to my performance.

When my brain awoke this February/March from my annual winter hibernation from baseball, I hit upon the construct for a test that I believe can remove many of the conceits of past tests. It could be used to test all of the following:

Player Ranking/Value Methodologies – aka ‘Player Raters’ (component #3)
Pre-Season Rankings (baking in components #1, #2, #3)
Statistical Projections (component #2 – specifically, how well do projection systems project stats relevant for fantasy baseball)
Playing Time Estimates (component #1)

In the process, it could also determine how much the final standings are impacted by one’s draft selections as well as the reliability (or lack thereof) of pre-season standings (as in using projections to determine who looks best in the pre-season).

Here is the test:

Take the draft results by team from the 38 Razzball Commenter Leagues in 2011 (hosted on ESPN, 12 team, MLB, 5×5, C/1B/2B/SS/3B/CI/MI/UTIL/9P/3 Bench/1 DL, 180 Games Started, Daily Roster Changes, Unlimited FA/Waiver pickups). This amounts to 456 teams’ worth of draft data.
Create a team total based on ‘expert’ rankings/$ totals/other arbitrary metric (like ESPN Player Rater Total Points)
See how these team totals correlate with each team’s final Total Standings Points

Notes:

For testing individual components, the other two components must be kept constant (e.g., to test Playing Time Estimates, use the same Statistical Projections and Player Ranking/Value Methodology).
Rankings need to be converted into $ because the difference in value between picks progressively gets smaller as the draft progresses.

The key benefits of such a test vs. a simulated test are:

These are ACTUAL draft results based on real fantasy baseball manager behavior.
The team standings points reflect ACTUAL in-season fantasy baseball manager behavior such as replacing poor-performing draft picks, using replacement players when players are injured, etc.

I’m going to focus my first test on Player Ranking/Value Methodologies (aka Player Raters) because it is the easiest one to do. Why? Because there is a sure-fire, uncontroversial source for Playing Time Estimates and Statistical Projections to act as the ‘constant’ – 2011 Final Season Statistics.

I tested the following free public sources as part of the test:

Razzball Point Shares – 12 Team ESPN
ESPN Player Rater (It’s one player rater so no way to customize for league format. Note: This link will likely be overwritten with 2012 data once the season starts. I have archived the results.)
Last Player Picked – 12 Team League, $260, ‘Optimal Hitter/Pitcher Mix’, using same roster format as listed above with 6 SP/3 RP (most representative split of pitchers based on league behavior)

In addition, I tested a total points formula that’s primarily based on the one that Tom Tango created for the Forecaster Challenge: HR+SB+(R+RBI)/3 + (H-(0.27*AB)) + 2*W + 1.5*SV + K/5 + (IP-(H+BB+ER)/2). The one difference was to multiply Saves by 1.5 vs. 1 to better reflect RP value.

Since these leagues are ostensibly populated by Razzball readers, I first wanted to test to see if there might be any bias in draft behavior. Below are the correlation percentages between the Average Draft Positions (ADP) for players in the Razzball Commenter Leagues (RCL) vs. Grey’s pre-season 2011 rankings, the pre-season 2011 Point Share rankings, and the ESPN Top 300 for 12-Team leagues. I included all players drafted in at least 30 of the 38 leagues. I broke out the ADP for the top 100 teams vs. all 456 teams to see if there might be a ‘Razzball’ bias amongst only the top teams.

	Correlation (%)
	RCL ADP-Top 100 Teams	RCL ADP- All Teams	Grey’s Rankings	Point Shares (3/8)	Point Shares (Late March)	ESPN Top 300 (12-Team)
RCL ADP – Top 100 Teams	—	99.5	91.8	79.5	82.7	96.7
RCL ADP – All Teams	99.5	—	92.6	80.4	81.5	96.7
Grey’s Rankings	91.8	92.6	—	72.5	72.9	85.5
Point Shares (3/8)	79.5	80.4	72.5	—	96.9	78.3
Point Shares (Late March)	82.7	81.5	72.9	96.9	—	79.9
ESPN Top 300 (12-Team)	96.7	96.7	85.5	78.3	79.9	—

I assume the extremely high correlation with ESPN’s Top 300 for 2011 (96.7%) is driven by the default ADP used in the draft software. Interestingly, Grey’s rankings and my Point Share rankings differ from ESPN’s (78-85% correlation) but differ more from each other (~73% correlation). Given these correlations, I think it’s fair to assume that the Razzball Commenter League (RCL) draft results are fairly indicative of standard ESPN drafts.

Here are the correlation % results for the Player Ranking/Value Methodologies (links to each were provided above, here is an aggregated view). I tested both my actual Point Shares as well as my conversion to dollars. ESPN Player Rater is based on their Total Points in their Player Rater. Last Player Picked is based on their $ estimates.

Other notes:

Players not found in a player rater (usually based on injuries/missed playing time) are set at $0 for Point Shares/LPP and zero for ESPN Player Rater.
Any player with < $0 in Point Shares/LPP is capped at $0 as players that bad (or missed that much time) were likely excised from a team roster before they could do a full season’s worth of damage (and, remember, that ‘replacement value’ is at $0). For instance, Brian Matusz was drafted in every league. His $ estimate in Point Shares was -$31 in Point Shares, -$25 in LPP, and -7.08 in ESPN Player Rater points. All are now set to zero. For Point Shares, I capped it at -2.64 which is the equivalent of $0.

Source	Correlation With Team Standing Points
Point Shares	63.8%
Point Shares (converted to $)	63.7%
ESPN Player Rater	56.7%
Last Player Picked	55.2%
Points Formula	49.7%

Based on the above results, I would answer the question of “How much are the final standings impacted by one’s draft selections?” as probably somewhere in the 60-65% range. I can’t say for sure since it’s unclear what the actual ceiling for player rater accuracy. Please note that this is a wholly different question than “How much are the final standings impacted by one’s draft selections as valued by preseason rankings/projections?” That will be answered in my next test.

With assistance from Jared Cross (co-creator of Steamer Projections), I tested to see whether these differences vs. Point Shares are statistically relevant at a 95% confidence interval (this is the typical standard confidence interval used in research). Here were the findings (worksheet found here):

Source	Confidence of Difference vs. Point Shares
Point Shares (converted to $)	53.98% (z-score of -0.1)
ESPN Player Rater	99.90% (z-score of -3.1)
Last Player Picked	99.9+% (z-score of -4.2)
Points Formula	99.9+% (z-score of -5.0)

The minute difference between Point Shares and my $ values isn’t that surprising since my dollar conversion formula is just a calculation from the Point Shares. If it resulted in significantly different results, it would be a sign that my calculation was flawed.

I cannot say for sure why Point Shares beats ESPN and Last Player Picked as I do not know all the details behind their methodology. They correlate at 93.5% which isn’t markedly higher than their correlation vs. Point Shares (89.9% for ESPN, 92.4% for LPP). Last Player Picked is the more transparent of the two in terms of methodology and it’s clear that Mays @ LPP uses ‘replacement level’ as the foundation of his analysis (vs. me using ‘average player level’). No idea if that really plays a role here. If I had to guess what drives ESPN’s Player Rater, I’d venture some application of Z-Scores per category.

I also really don’t care to spend too much energy researching why my Point Shares methodology appears to be superior. One variable I can say for sure is that my position factors (e.g. a catcher w/ same stats as an OF is worth more) have no measurable impact. I ran Point Shares with no positional adjustments and got a 63.73% correlation instead of 63.78% (z-score 0f -0.1).

If you compare each of the three rankings/$ estimates, you could potentially deduce some of the methodology differences. For instance, it probably comes as no shock to anyone familiar with the ESPN Player Rater that – when comparing it to Point Shares – some of the largest differences come into play with players whose primary value comes from stolen bases. Here are some examples (Point Share Rank / ESPN Rank): Michael Bourn (47/14), Elvis Andrus (77/51), Coco Crisp (109/60), and Brett Gardner (118/66).

If Bourn was really the 14th most valuable player, you’d think that teams who drafted him received great value (ESPN had him ranked 89th in the pre-season) and performed disproportionately better vs. teams that did not draft him in the Razzball Commenter Leagues. As you can see in this spreadsheet, teams who drafted Bourn finished almost exactly in the middle of the pack. While this test isn’t perfectly conclusive of player value (e.g., Granderson only ranked #89), the results seem to correlate fairly well with expectations. The following are in the top 10%: Kemp, Ellsbury, Weaver, Verlander, and Bautista. The following are in the bottom 10%: Hanley Ramirez, Carl Crawford, Jayson Werth, Joe Nathan, Kendrys Morales, Chase Utley. I wonder if this potential issue with the ESPN Player Rater is driving Matthew Berry’s 2012 love for Michael Bourn (note: even at Point Shares #47 rank, you could make an argument if you felt confident that Bourn could repeat his 2011 stats that he warrants a 3rd round pick. I’d consider him a 5th round pick at best.)

It also should be noted that both Last Player Picked and ESPN Player Rater have significant usability advantages vs. my Point Shares. Last Player Picked can customize $ estimates based on just about any league permutation imaginable. While ESPN Player Rater doesn’t allow for league customization, it is updated throughout the season which is a huge advantage vs. Point Shares/LPP.

I tried to be as transparent and unbiased as possible with this analysis. The one piece of information that I didn’t link to is the actual draft selections per team. I will provide that once I’ve completed my next analysis. Please feel free to comment with questions and/or to point out ways I may have screwed up the analysis.

My next test will be testing 2011 Player Rankings against team results. I will only use free, publicly available rankings unless authorized by someone at the company behind the subscription-based rankings. All player rankings must have a date stamp prior to the beginning of the 2011 season. If you see a notable omission below, please provide me with a link to the rankings. Thanks to FantasyPros.com who helped me gather some of the below rankings:

Razzball – Point Shares (1 version on March 8th, one done around end of March)
Razzball – Grey’s Rankings
ESPN – Matthew Berry’s Top 200
ESPN – Pre-Season Top 300
FantasyPros.com Aggregated Top 300
FoxSports Top 300
Hardball Times (Jeffrey Gross) – not public but permission-provided
KFFL Top 200
Last Player Picked – using 2011 Composite Stats
RotoChamp Top 300
RotoExperts Top 300
SI.com Top 300
USAToday.com Top 200

Note: CBSSports.com uses a static link for its free pre-season guide so the link now points to 2012 rankings (if someone has a saved download of the 2011 PDF, please e-mail it to me at [email protected]). Our pals at Yahoo! (perhaps wisely) do not publish pre-season rankings.