Razzball is hosting the Roundtable this week.  I figured that we’re known as the class clowns so I’d surprise them all with a statty question.  Here it goes…

What sabermetric or alternative statistic (e.g., Ground Ball ratio, Contact Rate, etc.) do you find to be highly over or undervalued for fantasy baseball player valuation purposes?

Brett Greenfield – Fantasy Phenoms

I think that BABIP is a highly overvalued statistic for analyzing a pitcher and I’ll tell you why.

Certain pitchers tend to consistently have low BABIP’s, such as Chris Young of the Padres.  This is primarily because he is one of the biggest fly ball pitchers in the major leagues.  In 2006, Chris Young led the majors in fly ball rate.  In 2006 he also led the league with the lowest BABIP.  In 2007, Chris Young led the majors in fly ball rate.  Again, in 2007 he led the league with the lowest BABIP.  Pitching in Petco Park strongly favors pitchers in general, but it also favors fly ball pitchers, since the outfield is so big.

BABIP, in general, is a stat that favors the fly ball pitcher, too.  Let’s take two equal pitchers, where one of them induces 70% groundballs and the other induces 70% fly balls.  The pitcher who induces 70% fly balls is far more likely to have a lower BABIP than the pitcher who induces 70% ground balls.  This is because a ground ball has a much better chance of going for a base hit than a fly ball does.

In fact, in looking at the 20 pitchers with the lowest BABIP’s from 2008, 90% of them were pitchers who favored the fly ball.

The 10% that were in the top 20 that induced more ground balls than fly balls, yet still had a low BABIP, should be considered lucky.  Most tend to consider every pitcher in top 20 (or those with BABIP’s below .265) as having been lucky and predict that their ERA and WHIP will regress the following year.  Sure, the top 20,  regardless of fly ball or ground ball tendency, could very well see their BABIP deviate closer to the norm the following year.  But if they are strong fly ball pitchers, their BABIP may not go up enough to make a noticeable difference in their ERA or WHIP, rendering BABIP as a useless statistic when evaluating them.  Several pitchers tend to have low BABIP’s year in and out such as Matt Cain, Ted Lilly and Chris Young.

BABIP can be useful when combined with other statistics, but when used by itself, can be deceiving. 

Patrick Cain – timesunion.com Fantasy Baseball blog

My often-cited stat problem is summed up in two words: Spring Training. But to be more specific, in a few weeks many of us will crunch the numbers to see what players saw a 200 pt increase in slugging percentage during ST. It’s alleged that three-quarters of players that hit this mark, have better than normal years.

I have a few problems with this.

1. Define “better.” If Player A hits a lifetime of .700 OPS and now is .710…so what. That’s not a breakout.
2. Many guys who do improve aren’t a surprise. They’re young players who got a year old and a year wiser.
3. Most of all, the data sets are small for each player.

Here’s a list I found from late in ’08’s Spring Training. The group certainly has a few gems (Josh Hamilton) but also a number of flops.

Chris Synder, Mike Morse, Ivan Rodriquez,Melvin Mora, Grady Sizemore ,Brian Anderson, Chris Burke, Rafael Fucal, B.J. Upton, Mike Cameron, Yorvit Torrealba, rick Aybar, Torii Hunter, Curtis Granderson, Placido Polanco, Billy Butler, Gerald Laird, Tony Gwynn Jr., Craig Consell, Josh Hamilton, Ryan Shealy, Andre Ethier, Robison Cano, Ray Durham

It’s a small data set, but the list wasn’t too impressive. That being said…I’ll still take note of it when I draft.

Tim Dierkes – RotoAuthority.com

Most people who read this are well aware of BABIP (batting average on balls in play) and its uses for pitchers.  However, in a more general sense, I still find that most fantasy baseball players are unaware of or choose not to look at this statistic.  It is a pretty easy concept but it still has not hit the mainstream.  I don’t think it’s really close to hitting the mainstream.  On baseball broadcasts we are still given the misleading impression that a low opponent’s batting average is an entirely controllable skill.

I am not saying that a pitcher with an abnormally low BABIP should be dismissed.  More that if a pitcher’s brief successful run is clearly leaning on a low rate of hits allowed, he is probably a fluke.  Even casual fantasy players should be glancing at a guy’s last five starts and if he’s succeeded because the hits did not drop in they should know to pass.

Jon Williams: RotoExperts

There are two distinct groups of fantasy owners -– those that use advanced statistics, and those that do not. I do not think that I am taking a huge leap to suggest that those that do not use them are primarily those that do not understand them. Because if they did understand them how could they not use them? The problem is many owners who use them (and many writers who write about them) do not truly understand how to utilize these statistics. There are many examples of this but I will just point out a very common one.

BABIP is an excellent statistical tool that measures the number of batted balls that safely fall in the field of play for a hit. The typical batter is able to average between .290-.310 on the balls they put in play. The uninformed owner will assume that if a batter has a BABIP of less than .300 in any given period that he was unlucky. And he assumes that if a batter averages greater than .300, that the batter was lucky. In both cases, this owner will assume that the batter will regress to a “normal” .300 BABIP or thereabouts. This is too many assumptions and we all know what happens when we assume.

In reality, every batter and every pitcher has a different level of skill. For example, players who excel at utilizing their speed out of the batter’s box tend to have higher BABIPs. There are also batters (even some with speed) whose skill level leaves them with a BABIP below the .300 average. If an owner truly wants to utilize these stats to advance their game, they need to read more than the basic definition of the stats and examine their use. An easy way to do that is to read the work of those that truly understand. The writers at HardballTimes.com and Fangraphs.com are very good places to start.

Patrick DiCaprio – FantasyPros911.com

In my mind it is clearly ground ball ratio. The percentage of groundballs is simply ignored by most fantasy owners, who instead focus on K, BB and K/BB. Valuable though these things are the fact is that everyone knows them, and they do not represent value. The most undervalued pitchers in baseball are those with middling control but huge groundball rates and good strikeouts. How did Ubaldo Jimenez do last year?

The second most ignored group are those with good groundball rates and weaker strikeouts, at least in deep “only” leagues. If you take a list of guys with the same K rates (in theory) but group A has good control and group B has high groundball rates, group A will probably do better but group B will have a lot more value.

Derek Carty – THT Fantasy Focus

This is a topic I could talk all day about.  There are so many stats that either shouldn’t be used or are used incorrectly.  I won’t single out any person or site individually, but I’ll list a few stats that I don’t like.  The first one I don’t imagine will be on anyone else’s list, and it’s one I imagine some of you thought was a good one to use.

K/BB: It seems like everyone is using K/BB ratio these days, and on the surface it makes sense.  DIPS Theory says that we should focus on events that a pitcher has control over.  Strikeouts and walks are the two most important of these events, so capturing their effect in one stat makes sense.  There are two problems, though:
1) A strikeout is not worth the same as a walk, so weighting them equally is flawed thinking.  I found here that it’s better to have lots of strikeouts and lots of walks as opposed to few strikeouts and few walks, even if the K/BB ratio is exactly the same.  I then took it a step further and created a stat that properly combines the two.
2) Ratio stats are almost never a good idea.  Say a pitcher has 10 K and 5 BB.  That makes a 2.0 K/BB.  But it also makes an 0.5 BB/K.  Using K/BB shows double the impact of using BB/K, yet they are measuring the same thing (and which we use is completely arbitrary).  If you’re set on weighting the two equally, at the very least do K minus BB (divided by IP or TBF or something like that).

BABIP: This isn’t a bad stat, it’s just that so many people use it incorrectly.  And quite honestly, I have no idea why.  It’s very simple.  Most pitchers regress toward the league average BABIP of around .300 or .305 (some sites don’t even get this right, saying that league average is some different figure, or saying that it’s different for batters and pitchers.  It’s not!). Very few pitchers can repeatedly do better or worse than this, so we say that pitchers have very little control over BABIP.  Batters, on the other hand, can have a substantial amount of control over BABIP.  Ichiro, for example, has a .356 career BABIP. Hitters do not regress toward league average, rather, they each regress toward their own, unique number.  Despite this, we’ll still inevitably hear people continue to use it incorrectly.

A few other stats I dislike but don’t have the room to discuss (if you’re interested in an explanation for any of these, please feel free to e-mail me):
GB/FB (not to be mistaken with GB%, which is a good one), OPS, Linear Weighted Power, FIP, ERC, BB/K (for hitters), ISO, BAA, speed score to represent steals, H/9, HR/9, WHIP, Runs Created, Quality starts (QS), BB% to calculate anything other than times on base, LD% + .120 as xBABIP

My final point is this: if you’re reading a site that is using advanced metrics, or any metric that you aren’t 100% familiar with, make sure that this person/site truly knows what they’re talking about.  There are plenty out there that simply don’t, and relying on poor advice can be so detrimental to your fantasy season.

Adam Ronis – Newsday

I think BABIP is a good measure for hitters when used correctly. Too often people refer to an average BABIP of around .300 for a hitter, but it varies for each individual. Take Edgar Renteria as an example. His BABIP was .326, .348, .317, .318, .325, .375 from 2002-2007. It was .294 last season, so you can expect his average to improve this season.  Ichiro is the perfect example of how it varies for each individual. The BABIP for his career are: .371, .347, .333, .401, .319. .350, .390 and .337.

Many guys with elite speed will have higher BABIP, such as Carl Crawford. His BABIP since 2003 is .329, .326, .328, .332, .375 and just .301 last season, so you can expect Crawford to hit for a higher average. BABIP is a good tool to use, but it has to be examined on an individual basis.

Rudy Gamble – Razzball

I figured I’d wait to see how everyone else responded before taking a crack at my own question.  Looks like I don’t have to cover BABIP!

Undervalued:  Batted Ball Statistics for hitters (Fly Balls/Ground Balls/Line Drives). I believe these were first identified by Ron Shandler.  Line Drive % is generally good for predicting high average hitters but I tend to ignore it as I trust projection systems like Marcel, CHONE, and ZiPS for batting averages.  I find the % of Fly Balls is crucial for understanding a player’s HR potential.  For instance, let’s look at three players that seem to have 30/30 potential – Grady Sizemore, BJ Upton, and Matt Kemp.  Based on watching these players play, they all seem like 30 HR possibilities.  But their 2008 GB / FB were:  Sizemore 34.9%/45.7% , Upton 50.5% / 30.6%, Kemp 45% / 32%.  Basically, Sizemore hits almost 50% more fly balls than these two and, thus, is a safer bet for 30+ HRs.  Unless Upton and Kemp greatly change their approach at the plate (not unprecedented but rare), their upside is likely 25 HRs and will likely end up at 20.

Overvalued:   VORP (Value Over Replacement Player) – Okay, this stat isn’t probably overvalued by many but this is commonly used by those creating their own player rankings.  For fantasy purposes, It determines player value by comparing a player against the best undrafted player available at his position.  My problem with this is that this tends to overvalue players at 1B/3B/OF and undervalue 2B/SS/C.  The reason?  Focusing on the best undrafted player ignores the drafted players.  Let’s narrow the focus down to HRs.  Using 2009 projections, I’d estimate the VORP 1B to have about 14-15 HRs.   This is James Loney/Casey Kotchman territory.  For 2B, this is at 10 HRs – say Orlando Hudson.  This would make a 30 HR 1B about the equivalent of a 25 HR 2B (+15 VORP).  The problem – there’s probably 6 1Bs who’ll reach that mark whereas only 1-2 2Bs will reach it.  How can they be equal?  Well, the average drafted 1B is at about 28 HRs.  The averaged drafted 2B is at 16 HRs.  Using the averages, you would find that a 30 HR 1B is closer to the equivalent of a 18 HR 2B.  I’d argue VORP is only useful for fantasy baseball purposes when valuing a trade where you’ll need to replace your end with a free agent.  But if you are ranking players, it is better to use a position average over the VORP.

  1. big o says:

    Rudy :
    enjoyed this article .

    at the risk of running contrary to the theme , i would like to find someone willing to tackle a new stat …. a projection for team svo’s .

    what prompts this is my trying to figure out how many saves fuentes will get this year .
    k-rod saved 62 out of 69 svo’s last year .
    how many chances will fuentes get ? my gut instinct tells me that he should get no less than 50 svo’s , but what the hell do i know ?

    it seems (to me) there should be enough historical data to take a stab at providing this type off team projection …. i.e 1-2-3 run games , team defense , strength and weighted schedule of opponents , etc.
    (i’m sure you can think of more appropriate factors than those i have just listed ).

    anyway , i’m left with the same question ===> how many svo’s will fuentes get this year ? ?
    more than nathan ? or f. cordero ?

    this might seem completely foolish and unsophisticated to you , but i would enjoy this .

    thoughts ?

  2. TBone says:

    Great discussion! I find the GB/FB ratio especially tantalizing. Like Rudy pointed out, its certainly a strong indicator in determining what sort of upside a player has. I’ve used it the past two years and its helped a lot more than its hurt in projecting totals.

    BABIP is a rocking indicator as well. I like to use that in conjunction with WHIP.

  3. Josh says:

    i’ve been reading these since last year, and i think this was the best question and answers i’ve seen yet. i don’t often see writers discuss advanced stats they don’t favor.

    Rudy, I’ve been thinking about vorp vs. average at a position, so I’m glad you brought it up. I’m having trouble conceptualizing what the average for CI or MI should be. some teams will draft 2 of a position early, and have a top flight 1b as CI. this obviously makes the CI/MI positions stronger and the IF positions weaker on average. should I just average the 12 best at each position, then the 12 best remaining ones? it’s the only way i can think of right now, but it seems very flawed.

  4. On VORP–about 5 years ago I went into a mixed points league literally 30 minutes before the draft, printed out a sorted list of VORP and won the league. I have always been intrigued to try it again and see what happens. In points leagues it is a pretty good substitute if you have nothing else.

  5. Junker23 says:

    Did Brett and Derek just say conflicting things about pitcher BABIP?

  6. @big o: Interesting suggestion that fits well within the theme. I know Derek loves making up his own stats. Our new stats is THTNS/Day which is the # of new stats made up by Hardball Times per day.

    My gut says that projecting SVO’s would be a fool’s errand for fantasy baseball purposes. The perceived correlation with wins is higher than the actual correlation – case in point, Soria had 42 saves last year! K-Rod and the Angels last year looks like an outlier vs. something that can be predicted.

    The only factors I’d take into consideration for a closer are:
    1) Job Security – Saves are first and foremost about opportunity.
    2) K Potential – The 30-40 Ks difference between closers is worth almost a point in the standings.
    3) WHIP Projections – While a bad WHIP pitcher won’t kill your team WHIP, a good one definitely helps. This will help drive ERA and increase the Save %.
    4) Injury History – Again, opportunities are key. if you’re injured, no opps. Wagner and Percival scared me last year…

  7. @Junker23: Stat fight! Yeah, they kind of are conflicting arguments. To balance the two, I’d say that if a pitcher has been able to show a long-term ability (say 3+ years) to throw a lot of fly balls and keep a below average BABIP, it’s fair to assume he won’t regress to the mean. I would imagine a pitcher with extreme ground ball %s would be the opposite (although Lowe’s Dodger years defy that theory). But, overall, I’d say that most pitchers will regress to this ~.300 mean so watch out Gavin Floyd (.268), J-Duch (.240), and Armando “Big Blownitez” Gallarraga (.247)!

  8. @junker: I think we said the same thing, but from two different angles.

    He said most regress towards the mean.

    I pointed out the few that don’t regress towards the mean (implying that most do, but since not all do, it’s not always accurate to use on every pitcher).

    Nice questions Rudy.

  9. @Pat: VORP works beautifully for points leagues. In fact, points leagues are ridiculously easy to value players for since all you need to do is add each player’s projected point total, subtract replacement level at that position, and then sort by adjusted points. Voila, you have yourself a ranking list in 10 minutes.

  10. To clarify, some pitchers do not regress towards league average. Tom Glavine’s career BABIP is .286. But it’s not until after many years that we can say with any kind of certainty that this is a true skill, and after those years it might be too late for fantasy purposes. If we start the clock at age 28, by the time the pitcher is 33 or 34 or 35 he might have a different set of skills and expecting a low BABIP might not be entirely prudent.

    Also, as to Young, his BABIP wasn’t entirely built upon FBs. He also let up fewer GB hits than league average in 2006 and 2007.

  11. @Mike Podhorzer: VORP is potentially as bad for point leagues as for standings leagues. Because of CI/MI/UTIL and the reduced disparity in power amongst lower-tier 2B/SS and CI/OF, VORPs all regress to a certain Fantasy Free Agent mean. Off the top of my head, it would be something like .280/70/15/70.

    Fundamentally, VORP is valuing a player against the WORST available option. Again, okay if you’re assessing the risks of trading a player and dipping into the FA pool. But for straight-up player valuation, you should be comparing against the average to get a better idea how you’ll stack up vs. the competition. And while D-Lee about equal amongst 1Bs but using the average will lead to better valuation amongst other players.

    Last illustration on why I prefer using the average (a la Point Shares) vs. VORP. I’ll recycle something I posted on that magnificent bastard site called Fantasy Baseball General sites:

    Let’s say there’s a player who steals 300 SBs. Anyone with this player would obviously be in 1st for SBs. What is the best estimate for that player’s impact on a team?

    From VORP, you’d end up with 9 points – this would take the VORP team from the cellar (1 point) to the top (10 points).

    But using average – where you credit a team 5.5 points – it would be 4.5 points (10-5.5).

    Here is how that 4.5 points could be viewed:
    1st place team in SB – 0 incremental points
    2nd place team in SB – 1 incremental point
    3rd place team in SB – 2
    4th place team – 3
    5th – 4
    6th – 5
    7th – 6
    8th – 7
    9th – 8
    10th – 9

    Sum of the above is 45 / 10 = 4.5.

  12. Rudy, not again! The last debate on this might as well have been an epic. I can’t begin to explain again why it’s wrong to use average and not replacement level (but of course, I’ll try again anyway :) )

    “Fundamentally, VORP is valuing a player against the WORST available option.” No, it’s not. It’s valuing against the best available option should this player, for whatever reason, need to be replaced. It measures the actual drop-off that you will actually face should you no longer be able to use this player. That’s what it measures.

    Your point about 1B/3B/CI/UT is completely moot. In a league where you have a CI spot, it’s fundamentally sound to include all 1B and 3B in the same pool, then compare them all to the best unowned 1B/3B.

    “Since 3B is a more valuable position than 1B…” I’ll cut you off there: simply not true in a league that uses CI. In this league, CI becomes the position because the top 12 1B and 3B will all be drafted, and after that they are interchangeable commodities for the CI spot. There is no need to make a 3B/1B distinction.

    “he problem – there’s probably 6 1Bs who’ll reach that mark whereas only 1-2 2Bs will reach it. How can they be equal?” This doesn’t matter. If my Chase Utley gets injured, I don’t need 6 equivalent options to replace him. I can’t pick up 6 players and plug them all into my 2B spot. I need one. That’s why we measure against the best available, regardless of how large the drop-off after that player is or how many players but up equivalent numbers to him. Both are irrelevant.


  13. Derek,

    I think ballpark has a lot to do with it too. 2.5 years of Chris Young is enough in my book when you move from Arlington to Petco.

    There is more to it than simply looking at the numbers and making a quick judgment.

  14. A lot more of Young’s fly balls stayed in Petco, whereas they likely left Arlington.

  15. Sorry Brett, but I still think you’re wrong. PETCO’s effects on OF Hits are quite small. Using David Gassko’s park effects, PETCO is completely neutral (1.00) for OF FB 1B and actually inflates OF FB 3B (1.09). It does deflate OF FB 2B (0.88). Looking at Young’s 2005 Ranger numbers (when he posted a neutral BABIP), we see that he allowed 26 OF FB Hits on the year. With the effects applied, he would have allowed 24.4 OF FB Hits. Definitely not a major shift. That would move his BABIP from .304 to .301, not all the way to .237 or .250 or whatever number you claim is sustainable.

    If you’re implying that I made a quick judgment, that would be incorrect, as you can see.

    And even if the park effects confirmed your point (which they don’t), 2.5 years is nowhere near enough information to make any kind of sound and reasonable conclusion about a pitcher’s BABIP. This stat is so unstable it is a HUGE, HUGE mistake to do so. Whether that’s okay in your book isn’t the point; it’s whether this is actually the case, and it isn’t.

  16. As to the second point, the numbers again aren’t really there. In 2005, 9% of Young’s OF FB left the park. Here’s the full breakdown:

    2005: 9%
    2006: 12%
    2007: 5%
    2008: 11%

  17. @Derek Carty: LOL…yeah, we don’t need to go into another epic.

    You’re right – on the WORST available option. I meant to say a hypothetical WORST team possible. Like an expansion team :)

    I agree that VORP has value if you’re looking to replace a player but I guess we disagree on how to approach player valuation. I don’t care what it costs to replace a player. I care what he contributes relative to the players drafted on other teams. Valuing JJ Hardy based on his difference vs. Mike Aviles ignores the disadvantage I have vs. the teams with Reyes, Hanley, and J-Roll or potentially the advantage because the last 5 SS options are nearly at par with the VORP.

    I guess we can see how it plays out with Tango’s challenge (you’re doing that, right?)….

  18. sean says:

    @Rudy: Nice work! Looks like hitting at the bottom of this order really helped you to shine.

    I think you are dead on about GB/FB ratios in evaluating possible growth of power potential. I guess, I should clarify that by saying it’s a great tool to use when trying to piece together whether those 2Bs will turn into HRs.

    As for BABIP, I feel that it’s a much more useful tool to spot an outlier when dealing with pitchers than an unlucky hitter. Someone like Gavin Floyd is a perfect example of a BABIP that is extremely low given his skill set, which will likely result in a guy that will have trouble nearing his ’08 numbers without a similarly high amount of luck (or invisible skill).

  19. I have a question for anyone. I read (in Baseball Between The Numbers a few years ago) that a pitcher’s BABIP should be compared to the team BABIP. I have always gone with that rather than the league average (the book did not specifically say to do this, more of an implication that the team comparison made more sense since the pitchers had the same defense behind them generally).

    So…should I expect a guy’s BABIP to regress to the league average or to his team’s BABIP?

  20. Theo says:

    Team BABIP. Not all parks play the same, and not all teams defend the same. Obviously a team that fields very well is going to have a lower team BABIP, whereas clunk-footed teams (like Texas last year) aren’t going to get to as many balls in play.

    Using a standard, static number like “.290” for a mean is fine for eye-balling projections, I suppose, but looking at team BABIP’s is more accurate.

  21. sean says:

    @Theo: using team BABIP, I’m stuck wondering why some teams tend to provide better defensive support for a certain pitcher and better offensive support for others. It’s the same old question as to why a defense plays better in front of a backup goalie or why an offensive line protects a backup QB better than the starter.

    I think team BABIP can shed some light on the picture but it’s also dependent upon what type of batted balls that certain pitcher is giving up — ie: a speedy outfield might eat FBs for breakfast for an extreme FB pitcher in a pitcher’s park.

    Maybe comparing BABIP for pitchers with relatively similar skill sets (dom, ctl, cmd, etc) is a more valuable indicator. But maybe that also puts the cart in front of the horse by placing a pitcher in a group of peers…

  22. dont see how it should not be team BABIP in general, but in practice it probably doesnt matter that much unless you are talking about teams with terrible D like the Yanks and Pirates. So i have to agree with Tm and Theo here for the most part. but, again, this is really not an important fantasy discussion simply because the differences are real too small to affect your ranking of an individual player. the inherent variability is just too high.

    it is important if your goal is accurate projections, however. just my $0.02

  23. Derek and Rudy– if you want you can both come on to the Roundtable Radio Show (#1 baseball show on BTR!) and we can debate it with Podhorzer and the two of you.

  24. Tim, Theo, and Patrick,
    While my opinion isn’t set in stone here, I have to disagree with you.

    I just ran a quick, somewhat crude study. I looked at all pitchers from 2004 to 2008 with at least 125 IP for the same team in adjacent seasons (233 pitcher seasons). I found the correlations of their Year 1 BABIP and their Year 1 Team BABIP on their Year 2 BABIP.
    BABIP: 0.25
    Team BABIP: 0.21

    If I look at Average Error (lower is better here):
    BABIP: 0.021
    Team BABIP: 0.035
    League BABIP: 0.018

    Actual BABIP does a better job of predicting the following season’s BABIP than Team BABIP does, and League BABIP does a better job than both. Of course, this was a very crude study using arbitrary cut-offs (though I can assure you I didn’t fool around until I got a favorable one), I didn’t adjust for differences in league average, and I weighted everyone the same.

    I’ll probably run a more definitive study at a later date, but the crux of my argument is that Team Defense is prone to fluctuations from year to year, and Team BABIP is even more susceptible since BABIP itself deals with large variance. Looking at all teams from 2004-2008, the year-to-year R2 is just 0.28. That means that Team BABIP explains just 28% of the variance in next year’s Team BABIP.

    Ideally, we’d combine some Team projected BABIP with actual BABIP tendencies. There’s a nice middle-point in there somewhere. If it’s an either or thing, as it is for most readers right now, GO WITH LEAGUE BABIP.

    While I agree that FB/GB tendencies affect BABIP, they are nowhere near as large as Chris Young makes it seem.

    Tim, I would, however, be very interested in seeing that study if you wouldn’t mind shooting me an e-mail.

  25. Oh, and Patrick, I’d probably be up for the Roundtable Show on the replacement level topic. I doubt we’d settle anything, but it would probably be a good time and would at least bring each argument into the spotlight a little more.

  26. Oh, one last caveat to my little study. League BABIP was assumed to be .302 for all years. Again, I did this very quickly.

  27. Nadav says:

    Derek, I’ve tried several times to understand your argument against K/BB, but I just can’t make any sense out of it. Are you trying to argue that it shouldn’t be used as a stat category for fantasy scoring purposes, or are you saying that it’s not even useful for player valuation purposes at all?

    A few other objections:

    1) You argue that pitchers with the same K/BB rate are not equally effective. This is a fair point, but I don’t think it demonstrates that K/BB is useless. Like all stats, including “advanced” metrics, it’s best used as part of a suite of measures to be considered together. Same goes for GB% and K% — neither one is particularly useful on its own, but they’re both valuable when combined with other stats.

    2) In the second link you provided, you argue that your alternative stat, “K-BB RI,” is more useful than K/BB. Your stat may be a better predictor of pitching value than K/BB, but it’s also a good deal more complex. K/BB isn’t perfect, but it’s extremely easy to generate for any given pitcher. If you’re going to use a more complex stat, why not just use FIP or xFIP?

    3) What exactly do you mean by “Ratio stats are almost never a good idea”? Similarly, what do you mean when you say that K/BB has “double the impact” as BB/K in the example you provide? Impact on what, exactly? And relative to what?

    It sounds like you’re objecting to a particular use of the K/BB stat, but it’s hard to understand from your post what this incorrect use is. Again, I think it’s a fair point that K/BB should not be used in isolation, but I don’t believe anybody is arguing that point. If you look at the pitchers with the highest and lowest 2008 K/BB values at fangraphs.com, you’ll find that they tend to have the lowest and highest FIP values, respectively. In almost all cases, the only pitchers who defy that correlation are ones with particularly low or high home run rates. If you want to rely on easily available stats to compare the value of pitchers, I don’t see the problem with using K/BB combined with HR/9 (or GB%).

  28. Nadav says:

    Thanks for your response, Derek.

    I followed the link you provided for Tangotiger’s discussion of GB/FB ratios, and I understand your/his point about “double the impact” better now. It looks like he was specifically talking about the use of the ratio in developing a composite stat that’s meant to provide a version of OPS with more predictive value. A parallel example for K/BB would be if you were generating a FIP-type stat using K/BB instead of properly-weighted versions of K% and BB%.

    I’m still not 100% convinced that the extra insight provided by the K-BB RI stat is worth the extra work (at least for those of us who don’t already have spreadsheets programmed to calculate it). In your post about the stat, you mentioned the possibility of another post that would show how certain pitchers would be valued differently using K-BB RI instead of K/BB, but I wasn’t able to find a post like that in your archives. Do you happen to have any examples of where K-BB RI would give you a better picture of a pitcher’s value, while K/BB would lead you astray?

  29. How often does a pitcher have an abnormally low BABIP 2 years in a row? If Justin douche repeats this year is it simply another, for lack of a better term, lucky year? Or is he showing something that a .250 BABIP is his number. I want to say it’s simple dumb luck that makes Beane look like a genius, am I wrong?

    And Derek, what’s wrong with FIP? Is it not better than ERA? Or are no ERA-type projections as important as the pitcher’s component skills?

    I was under the impression FIP is helpful to predict for the next year, should I stop?

  30. Nadav says:

    Okay, I think I can answer my own question:

    I followed your suggestion of trying (K-BB)/IP and found a few examples of guys who would be undervalued by relying on K/BB instead (based on 2008 stats):

    Felix Hernandez vs. Kyle Lohse: Felix has a lower K/BB, but a higher (K-BB)/IP.

    Tim Lincecum vs. Mike Mussina: Tim has a lower K/BB, but a higher (K-BB)/IP.

    So basically, K/BB undervalues high-K, high-BB pitchers, relative to low-K, low-BB pitchers. Which is pretty much what you said in your post above. I now have a better picture of why relying too much on K/BB can be misleading.


  31. Clodbuster says:

    This may be completely unreasonable, but is there a stat for the level of competition a pitcher faced? In season, it would probably be difficult since betters’ avg. will constantly change, but in off season draft preparation when stats are stagnant, is it a worthwhile study?

    Results would probably be best for identifying elite set-up men and forecasting a pitcher who’d be moving to a new team/division, but I think they’d be relevant for ranking all starters if someone were to put in the time.

    It’s helpful to know that say, Randy Johnson held batters to a .220 batting average last year but if we dig deeper, we find that those batters he faced, only managed a combined .225 avg. to begin with, making his performance less impressive. These #’s are fictional by the way.

    Is there someone out there with software that keeps track of every batter’s Avg. at the time the pitcher faced him to come up with somewhat of a strength of schedule?

  32. @bpasinko:
    It isn’t a frequent occurence, but it does happen. To give a few examples, Scott Elarton did it in 2004, 2005, and 2006. More recently, Jeremy Guthrie, Ted Lilly, Shaun Marcum, and Carlos Zambrano did it in 2007 and 2008. As the above study I pointed to shows, we can’t treat this as a trend. If Duke does it again, I’d definitely still call it lucky.

    As to FIP, yes, it is definitely better than ERA. It’s not as good as something like xFIP or LIPS ERA or QERA because it doesn’t normalize HR rate, but it is still better than ERA.

    To clarify and summarize a little bit, we have component skills (K/9, BB/9, GB%, etc) and we have ERA estimators (FIP, LIPS, DIPS, QERA, etc). Those component skills are used to determine the ERA estimators, so it’s not really a matter of which is “better” since they sort of say the same thing. The ERA estimators give the final picture, but they’re not better per se.

    Tim Lincecum, for example, had a 9.2 K/9 and 4.0 BB/9 in 2007 for a 3.63 FIP. Looking at this, we can say (and I did) that Lincecum has a chance for big improvement because he strikes out a lot of batters, posts a good FIP, yet walks a good amount of batters. If he can cut down on the walks (the easiest of the Big 3 skills to improve upon), he could have a big year. As we saw, he did just that (3.3 BB/9 in 2008).

    So it’s more a matter of perspective than “this is better” in the case of component skills versus ERA estimators.

  33. Moonlight's Grahams says:

    One league I run is a fairly unorthodox head-to-head 6×6. for hitters it is Runs, RBI, SB, OBP, Totals bases, and strikeouts. For pitchers we use Wins, Losses, saves, holds, ERA, and K/BB ratio. Are these a fair judgment of talent?

  34. @Moonlight’s Grahams: Gotta tell you that I’m not a fan of those 6×6 and it’s not b/c they are unorthodox. Here’s waht I’m thinking…

    I don’t like Strikeouts as a hitting category. I’d replace that with AVG. And while TB will credit the otherwise ignored Doubles/Triples, it feels weak not having HR.

    For pitching, this format begs teams to only start upper-echelon starters and stock up on relievers. The only stat where a mid-tier pitcher helps is in Wins. They hurt on Losses, useless on Saves/Holds, and may not be of great help for ERA and K/BB ratio. Would prefer a counting stat (K’s) over the K/BB ratio and remove Losses for WHIP (which is one of the better stats).

    This basically means standard 5×5 with OBP and Holds thrown in. Kind of bleh for me….I say either do 5×5, a points league, or go the full sabermetric route….

    Just my 2 cents…

  35. big o says:

    @Rudy Gamble:
    thanks for your replies to my comment .
    but i think you’re stuck on individual closer considerations .
    what i’m looking for are projections on TEAM svo ‘s ….. the rest i would like to assess on my own .
    so for anyone reading this , i’ll assume fuentes (or any other closer) will “hold the job” because it seems irrelevant , and should not affect TEAM svo projections .

    as sometimes is the case , the discussion is more “prized” than the article .
    comments i particularly enjoyed are : “stat fight !” , “magnificient bastard site ” , and , nadav’s “suite of measures” .

    i must say that derek has made more “sense” to me , in this one thread , than i have ever understood before . perhaps i will return to his site and give it another go .

    dicaprio , however , remains an enigma to me . i’m sure the connection is broken on my end . not precisely double-entendres .
    more like tongue-in-cheek .

    of course , my caveat ==> i am often mistaken .

    still , this is good reading !! ….my thanks to ALL .

  36. Thanks, Big O.

    To clarify, are you saying that you’ve understood me better here, or just understood some of the concepts better because of the way I laid them out?

    If you ever have any trouble comprehending anything I write (or anything you hear anywhere else, for that matter), don’t ever hesitate to send me an e-mail. I’m happy to help out anyone who needs it.

  37. Thanks a lot Derek, I’ll have to look into those other ERA estimators, or like you said just the components that they use.

    Also and maybe more importantly, it’s more solid evidence I can use when arguing to my friends when they say the Duke is a top pitcher.

  38. Nick J says:

    Rudy & Derek & all,

    I’m endlessly fascinated by the average vs. replacement baseline debate, but why don’t we just test it? We sort of tried to over at Last Player Picked, but I didn’t feel like there was a definite conclusion.

    Couldn’t we just use 2008 stats and rankings based on whatever system (Derek’s SGP + replacement vs Rudy’s Point Shares) for common league settings and figure this out?

    I have my doubts about the Forecasters Challenge sorting this out because in that contest it seems like we’re testing both projections and valuation systems simultaneously.

    Anyway, I enjoy both of your writings very much. Thanks.

  39. Nick J says:

    And if you guys are looking for an impartial party to run the test, I’ll happily volunteer, as long as we can agree to the parameters.

  40. Rudy,
    Who’s more valuable: David Wright (or whoever your top CI is) or Brian McCann (or whoever your top C is)?

    What would you propose?

  41. Nick J says:

    My idea would be for all participants to submit a ranked set of players. Then I would run a snake draft, where each team would simply select the player with the highest ranking according to their own valuation system, until all rosters were filled. Then calculate the stats and category ranks for each team according to traditional 5×5 roto. Maybe I could try to run it several times with a different draft order, as I think that would probably play a role. It would be a bit arduous for me, as I don’t have the technical abilities to simulate things hundreds of times over, but I’d be willing to put in the time if the participants are willing to wait a while.

    Obviously, this isn’t exactly how ranking systems are used during a real draft, but I think it might serve to point out the differences between the different systems. I think we’d have to place some sort of conditions on the CI, MI, and especially DH slot, as we don’t want someone drafting a catcher as a DH. The other bug is that ideally I’d like to test 10 or 12 different systems against each other – maybe ESPN, RotoWorld’s rater, etc. I think Mays would probably agree to enter his Price Guide as well. But I don’t think that would get us to enough teams. So we could either make it a smaller league, or enter two or three teams from each system.

    This is just an idea. If someone else has a better plan as to how we could compare the different valuation systems, I’m all ears…

  42. @Nick J: The snake draft has several issues w/ it that I took Miles @ LastPlayerPicked.com to task. The biggest issue that is very difficult to correct is that the value of players in a pre-draft ranking system changes once a player is picked – e.g., taking Sabathia with my hypothetical 1st pick reduces the value of subsequent starters as I’ve already increased my hypothetical baseline from an average starting staff to one that’s above average. In addition, you have draft order bias, enforcement of IP minimums (LPP’s first go-around, he had his teams punt Wins and K’s by taking middle relievers, an unrealistic draft scenario), and issues where teams end up overconcentrating on certain stats (kind of like the biggest issue but it could be that Reyes and Crawford end up on the same team). Tango’s running this across something like 1000 iterations so the draft round bias should be close to eliminated. With enough iterations, the overconcentration on some stats should balance out across teams. I’m going to need to make tweaks to the draft list to try and balance my SP and RP picks.

  43. Nick J says:

    @Rudy: You bring up several important issues here. Before I get into those, however, I think we need to identify the exact question that we are attempting to answer. For me, that question is, “Which valuation system will lead to the optimal ranking system to use in a draft?” And within that, “When measuring players across positions, is it better to use an average baseline or a replacement baseline?”

    You first bring up the issue that players’ values change during a draft, according to both who you have taken for your team and the remaining available players. This is undeniably true, but a static valuation system like Point Shares or the Price Guide or Derek’s system by definition does not take this into account. In a draft what we really are after is marginal value – what is the difference in value between the best player available at a given position than the next best (or next few) players at that same position, as compared to the difference in value between the best players and next best players available at the other positions? A dynamic system (perhaps some sort of drafting software) might be able to make these moment-to-moment calculations, but this isn’t typically what we’re talking about when we’re talking about valuing players in a vacuum. In short, I believe that in evaluating the merits of a particular system we need to step back and let the system choose according to the values it has. Interfering and saying that since it just picked Sabathia then it shouldn’t choose a pitcher again, even if that’s who the system says is the most valuable, is defeating the purpose of the experiment.

  44. Nick J says:

    The draft order bias is a real problem, but I think could be addressed by randomizing the draft order and repeating the experiment many times.

    I don’t believe an innings pitched minimum would be correct to enforce; let me explain why. When an average or replacement level is calculated from a given set of players, we must use the “best” set of players. If this optimal set includes more RP that one might think is “realistic,” that’s just the way it is. Just because it doesn’t seem realistic doesn’t mean that some of those middle relievers weren’t in fact more valuable than some commonly-rostered #5 starter types. If the argument is that when calculating the baseline (either avg or rep) one used more SP, then I think we should rethink that arbitrary inclusion of more SP when calculating said baselines. Of course, if you are tied to including more SP simply because that’s what most leagues do, we could just change the roster requirements to include 5 SP, or whatever. But again, I disagree with he masses on this point, and think that good middle relievers are undervalued. And isn’t an objective value system supposed to show us when the masses have made an incorrect assumption? In the LPP experiments, I didn’t think the problem was that the LPP teams punted Ks and Ws (if that’s the optimal strategy then that’s what you should do), but rather that since there were multiple teams using the same ranking system/strategy, the penalty for punting a category was less severe. In other words, it was hard to isolate the merits/shortcomings of a particular system because of the presence of that same system in the league. But if we only included one of each system, I think this problem would be minimized.

  45. Nick J says:

    The point about teams over-concentrating on a single stat (drafting Reyes and Crawford) is an important one, and I believe is a flaw of all of these systems. The problem is that when deriving a final value by combining the contributions from different categories, the need for balance across the categories is lost. This is also one of my complaints with Tango’s challenge. By converting all stats to a point system, there is no longer a need for balance. I believe an ideal valuation system could find a way to separate the contributions across the categories, making the choice of who to draft not by their total value but by the overall impact on each of the categories. This is part of the idea in SGPs, but I don’t think they address the issue fully.

    Finally, the test you ran with the Point Shares versus the other systems was a very interesting one, but I believe it answers a different question. Namely, “which valuation system will best project the standings in a typical league?” I believe however, that this is a different question from “which ranking system should I use to draft with?”

    Anyway, I’ve likely alienated most of the readership with these comments, but I do believe we need to address these things if we are to create a test that best represents the differences between valuation systems.

  46. Nick J says:

    By the way, I apologize if the tone of those last few comments was anything less than friendly. Please know that I have nothing but respect for Rudy, Derek, Mays, and anyone who has taken the time to come up with such advanced rating systems. They are so much better than anything I could have come up with. I just get really excited about these things and sometimes my opinions come across a little more aggressively than I intend. Thanks for letting me participate in the discussion.

  47. Rudy,
    Does your Point Shares system take determine player value as difference over average, or as a ratio?

    Raw value – Position Average
    Raw value / Position Average


  48. [email protected]Derek Carty: It is based on a weighted combination of (Raw Value – Position Average) + (Raw Value + Total Hitter/Pitcher Average). For ratio stats, there is an AB or IP multiplier. This difference is then divided into the increment that is estimated as the average difference between projected team totals. I’m experimenting with variations off average to see if it nets improvements. The latest is dividing each position into thirds, valuing each (Raw Value – Position Average) at .33 and then summing them together. I think this minimizes what is one flaw of the average – the fact that one players’ stats can have a disproportionate influence (e.g., Reyes’ impact on SS stolen bases).

  49. @Nick J: I agree the key question we’re trying to answer is what approach is best for drafting. Miles’s tests showed how ridiculous performing a snake draft test for a non-dynamic system really is. I had teams that had 4 SPs in the first 5 rounds! How is this in any way realistic? If a doctor said that fruits are better than meats, would you be disproving him by overdoing it on certain vitamins and not getting the necessary protein? What made the test even more ridiculous was the overinclusion of middle relievers in the results. This isn’t because I don’t think they are valuable, rather, they are extremely unpredictable. Show me anyone who had Balfour pitching the way he did.

    The test I ran that avoids these biases is to formulate several rosters based on either 2008 stats or 2009 projections (using Mock Drafts). 2008 stats are easier because you don’t have to worry about which projection system is being used. A system that accurately valued players would do a good job at valuing these teams. But so would a system that is completely correlated to the stats – e.g., valuing 50 SBs as 5 pts and 5 SBs as 0.5 pts.

    Here is an interesting test that would get rid of this issue…Take a league that has either been created based on 2008 final stats or mock drafted for 2009. Total the stats up so we have team standings as a baseline. We then take 20 or so players and trade them to every team for another player at that same position – e.g., McCann and Mauer traded, McCann and Russ Martin traded. After each trade, look at the point totals. Take the sum of the point totals for every trade variation. (Note: for OFs, SPs, and RPs, I’d swap with the 3rd OF, 3rd SP, and 2nd reliever to try and be as close to ‘average’ as possible).

    This test would still have its issues but better than setting up unrealistic draft scenarios.

    The best hypothetical test would have the methodologies dynamically recalculate after every draft selection. I’m not smart enough to program that… :)

  50. Nick J says:

    Interesting idea for a test. So if, for instance, trading McCann for Mauer netted a gain of 1/2 a point, then Mauer is “worth” a half point more than McCann? So that’s his relationship to McCann. But then how would we find his “absolute” value? Just remove his stats from the team totals and see the effect?

    We’d also probably have to repeat the test a number of times to account for a player’s value being somewhat league-dependent, but this seemingly would give you an average point value for a particular player. Seems pretty labor intensive, but maybe someone smarter than me could write a program that would do this.

    I think we’ll have to agree to disagree on the LPP tests. I just don’t think being “realistic” compared to a normal draft applies when we know what the end stats will be. That said, there were some other issues with those tests that I haven’t quite figured out how to deal with. Anyway, I don’t want to run a test that hasn’t been agreed upon to be “fair,” so I’ll try to think of something else. Maybe Derek has some ideas…

Comments are closed.