How Valid is the ESPN Player Rater?

Anyone who played ESPN 2007 Fantasy Baseball last year probably had two lingering questions throughout the season:

1) Did the ESPN employees responsible for the database crash that screwed up the first two weeks’ worth of 2007 stats befall a fate worse than Harold “Harass is one word?” Reynolds (rarely insightful on Baseball Tonight but he’s like Peter F*in’ Gammons compared to replacement ex-2B Eric Young)?

2) Is the ESPN Player Rater completely incorrect or just mostly incorrect?

While we wait to see if Outside the Lines’ Bob Ley or the ESPN Ombudslady answer our repeated requests for answers on question #1, we took on the challenge of question #2.

Now you may ask, “Is this a valuable exercise beyond the joy in potentially proving ESPN wrong?” Fair question. The answer is a resounding yes. One of the greatest challenges with Fantasy Baseball is determining how to compare the value of players and their statistical contributions – any idiot could tell you that A-Rod and Peavy were the best hitter and pitcher respectively but was Matt Holliday more valuable than Jimmy Rollins? Understanding the value of each of the statistics – ESPECIALLY in non-counting stats like batting average, ERA, and WHIP – helps from the draft throughout the season as rosters are juggled, free agent options are considered, and trade offers are mulled. (That being said, proving ESPN wrong was a motivating factor.)

In no particular order, here are some of the questions we have on the 2007 season that we don’t believe ESPN Player Rater correctly answers (based on 5×5 MLB Universe):

1) Who is more valuable: A-Rod (best hitter) vs. Peavy (best player)?

2) How could there be so many starting pitchers at the top? (13 in top 20, 19 in top 30) Is that valid or just faulty weighting?

3) ESPN creates a floor and ceiling of 0 (floor) and 5 (ceiling) for points per category. Does this misrepresent the contribution (or lack thereof) of players and what is the impact on player rankings? For example, how is Jose Reyes’ amazin’ 78 SBs worth the same amount of points as Josh Beckett’s merely impressive 20 wins? How is Richie Sexson’s horrific .205 batting average worth the same amount as Todd Helton’s sloth-like 0 SBs?

4) How is it possible that relievers J.J. Putz and Rafael Betancourt are worth more for ERA and WHIP than top starters like C.C. Sabathia, Johan Santana, and Brandon Webb when these relievers pitch about 1/3 of the innings as the starters?

5) Shouldn’t players at shallower positions receive bonus points – e.g, are Hanley Ramirez’s stats at SS more valuable than A-Rod’s stats for 3B?

To tackle these questions, we created our own Player Rater. See attached for the rankings and notes on the methodology we used. The methodology is nerdy and quasi-scientific – with heavy conceptual influence from Baseball Prospectus, Bill James, and other leaders in the field – but you could skip all the mathy stuff and just look at the rankings if you like.

We will answer these questions – and potentially others – over several posts. Let’s start off with the first one…

1) Who is more valuable: A-Rod (best hitter) vs. Peavy (best pitcher)?

ESPN has A-Rod #1 and Jake Peavy #2 by the slimmest of margins – 19.8 to 19.75. This order is in line with popular opinion. I’m sure if you were to do a ‘hindsight draft’ – whereby you draft based on 2007 stats – that Scott Boras’s wet dream would be picked #1 almost every time. There’s a much greater chance Peavy would go #3 or later than #1.

But is this some type of hitter/East Coast bias vs. pitcher/West Coast bias? Well, let’s look at how ESPN calculated the point totals:

A-Rod: Runs = 5, HR = 5, RBI = 5, SB=1.54, AVG=3.26
Peavy: K = 5, W = 4.75, SV = 0, ERA = 5, WHIP = 5

A-Rod led the majors in R/HR/RBI to earn the three category max of 15 points. His above average 24 SBs and .314 AVG netted him another 4.8 points. Peavy had a possible max of 20 points since Bud Black inexplicably turned to Trevor Hoffman to close games and was a win short of getting all 20 (should’ve showed up in that one-game playoff, Jake). So, maybe that’s it. Even the best starting pitchers can only contribute in 4 categories so a great 5-category hitter is more valuable.

What do our rankings say? It says Mr. Peavy is #1 by a healthy margin – 29.4 to 26.2 points. I’ll admit it – we were surprised too. Let’s look at our point allocations by category to explain it:

A-Rod: Runs = 4.9, HR = 8.3, RBI = 6.8, SB=3.2, AVG=3.0
Peavy: K = 6.8, W = 6.2, SV = -0.3, ERA = 9.3, WHIP = 7.4

This comparison is as good an opportunity as any to go over our methodological basis for crediting points (skip over if you’re mathphobic):

Rather than using a 0-5 point scale per category, we mirrored the VORP concept from Baseball Prospectus and created composite stats for what would be the best available option (BAO) at each position – e.g., what are the stats of the 11th best catcher who’d be the next best option post-draft. We then created a team full of these BAOs (kind of like a fantasy expansion team – hell, a team with Carlos Pena, BJ Upton, Troy Tulowitzki, Ryan Braun is much better than the MLB variety) and averaged their stats to create the BAO hitter and BAO pitcher.

We then took the team totals of our ESPN league to come up with relevant increments to award points. Our method would credit players with positive points if they performed above the BAO in a stat and negative points if they performed below (example: Magglio Ordonez was worth 7.7 AVG points while Richie Sexson was worth -4.6 points)

The increments were based on the standard deviation between our 10 teams’ totals which came out to roughly 1-2% of the average team’s total for R, HR, RBI, K, and W. SB and SV turned out around 4% of the average because team totals in these stats tend to be more widely distributed than the other counting stats (related to few players contributing the lion’s share of points – another way of explaining this is to consider the impact of 5 Wins vs. SBs and SVs on your league’s rankings. Wins would prove more valuable.). Lastly, the ratio stats – AVG, ERA, and WHIP – are around 0.5% of the average team’s total as there is much smaller % change between players (e.g., a great hitter hits .350, a bad hitter .250. That’s only a 40% difference. A-Rod hit 150% more HRs than fellow MVP candidate Mike Lowell)

These standard deviations were arbitrarily divided by 6 to create more point differential between players. For ratios, the team totals were multiplied up to reflect an individual player’s impact on the total – so a hitter would have to hit .0144 better than the BAO (assuming all 13 hitters had the same # of ABs) to raise the team’s average by the required .0011.

Lastly, we compared each player to two different types of BAOs: one specific to their position and one general (hitter or pitcher). These results were averaged together and helped to account for the fact that a BAO 1st baseman offers better stats than a BAO catcher so, all stats equal, the catcher is a more valuable hitter. (This topic will be further explored in another post. We’ve only got so much material to stretch over the offseason.)

So now let’s look at Runs vs. K’s to see this methodology in action. These make a good comparison in that the average team totals for these stats (based on our ESPN league) are nearly equal: 1150 runs and 1148 Ks.

A-Rod and Peavy led the majors in these categories (143 Rs and 240 Ks respectively) so ESPN credits each with 5 points. But the BAO hitter (who looks almost exactly like Luis Gonzalez’s stat line of .278/70/15/68/6) had 67 runs where the BAO pitcher (who looks closest to Carlos Villanueva’s season of 8 Ws/1 SV/3.94/1.35/99 K over 114 IP) had 101 K’s.

(Note: That may seem low for K’s but here’s a few other starting pitchers b/w 90 and 110 K’s that certainly saw some fantasy roster space during the season: C. Wang, G. Maddux, B. Sheets, C. Schilling, M. Mussina, J. Marquis)

So A-Rod had 76 more Runs than the BAO hitter and Peavy had 139 K’s more than the BAO in a category where teams had virtually the same average total. Another way of looking at it is that Peavy’s total would represent 20.9% of the average team’s total where A-Rod’s runs would be 12.4%.

The larger differential and impact of Peavy’s K’s vs. A-Rod’s runs are slightly curbed by the fact that Runs have a lower standard deviation which leads to crediting a point for 15.1 runs and 18.1 K’s. This nets out to 4.9 points for A-Rod’s Runs vs. 6.8 for Peavy’s K’s.

Peavy’s impact on Wins is similar to his impact in K’s and A-Rod’s HR and RBI totals net him 8.3 and 6.8 points respectively. His 24 SBs net him 3.2 points. (If anything, the ESPN total screws him – how could 24 SBs be worth only 1.5 points of 5 points, with 0 SBs equaling 0 points, given the average team only had 162 SBs?)

So it all comes down to the ratio stats (AVG, ERA, WHIP) to determine the winner.

Let’s look at A-Rod’s AVG first. To determine the impact, we start with a lineup made exclusively of BAO hitters. This group hits .2772. If we replace one BAO with A-Rod, the average goes up to .2806 – a difference of 0.0034. In addition, A-Rod had 583 ABs compared to our BAO’s 482 ABs. So A-Rod’s average has a greater impact than just 1/13th (it becomes worth about 1/11th). To account for this, we multiply his points by his # of ABs / BAO ABs (1.2). After accounting for the fact that the 3B BAO hits for a slightly higher average (.279), A-Rod’s batting average nets him 2.8 points.

Now let’s look at Peavy’s MLB-leading 2.54 ERA and 1.06 WHIP. Adding him to a BAO team (5 starters + 4 relievers with removing a BAO composite of the two) changes the team ERA/WHIP from 3.960/1.315 to 3.719/1.272. This is a net decrease of 0.241 in ERA and 0.043 in WHIP.

How huge is this? Well, looking at our league, this would be the number of points gained in ERA by deducting .241 (1 team – 5 pts, 2 teams – 4 pts, 2 teams – 3 pts, 3 teams – 2 pts, 1 team – 1 pt, 1 team – 0 pts). If you factor in that the 1st, 2nd, and 3rd place teams had a maximum of 0, 1 , and 2 points to gain respectively, you can see that at a MINIMUM, Jake Peavy’s ERA would’ve gained a team 2 points and more likely 3+. The WHIP difference is similar in impact.

Now we compared Peavy against our BAO pitcher who is a composite of the best pitcher available. This could be a starter or reliever. What if we compared him against strictly the BAO starter (the 51st best starter)? This would reduce the impact of Peavy’s Wins and K’s but would actually INCREASE the impact of his ERA and WHIP. Why? Because starters pitch more innings than relievers and tend to give up more runs and baserunners. The BAO starter has an ERA of 4.15 and WHIP of 1.34 (closest comparison – Carlos Silva) where the BAO reliever is at an ERA of 3.31 and WHIP of 1.24. Compared to the BAO Starter, his ERA is worth 9.9 ‘position points’ but 8.6 ‘player points’ for an average of 9.3 points.

In summary, we would say Jake Peavy was the MVP of 2007 Fantasy Baseball (5×5, MLB Universe) over A-Rod. While A-Rod had two high contributing categories (HR and RBI), one strong (Runs), and two above average (SB and AVG), Peavy has four high contributing categories with his ERA and WHIP point total of 16.7 dwarfing his closest competitors (Santana = 11.3, Sabathia = 11.2).

While we wouldn’t have the nut sack to draft Peavy #1 in a 2008 draft (pitching is less predictable than hitting), he’d be our choice for #1 in a ‘hindsight’ draft. And if you can’t even manage 20/20 hindsight, how can you expect to see clearly into the future…