Don't be shellfish...Share on FacebookTweet about this on TwitterShare on Google+

Yes, that’s a fantastic neck curtain I’m rockin’. Besides the point. Don’t stare. What this IS is (who you callin’ stutterer?) an attempt to translate some nerd speak into some useful fantasy baseball draft strategy.

More statistically-inclined minds than my own (mainly a guy with the handle “matthan” at DRaysBay) have figured out a pretty reliable way to calculate expected Ks from pitchers. “Tell us something we don’t know, Dick Anderson.” Okay, how about the coefficient of determination for this particular model is over 90%? *crickets* Considering most number crunchers take 70% and like it, 90% is like jumping-a-dead-battery-with-aspirin-and-chocolate useful. Oh, and it’s reliable like that down to 30 IP. That’s door-breaching-charge-out-of-steel-wool-and-a-fountain-pen exciting!

Here’s the formula: eK%=(ClStr%*.9)+(Foul%*.5)+(InPly%*-.9)+(InZSwStr%*1.1)+(OZSwStr%*1.5)

So why aren’t we reading about this magical formula all over the place? Well, it’s a trip to the dentist to compile the holey data and it uses wisps of cotton-candy-fuzzy math. Regardless, it’s fairly reliable if you floss through it and definitely useful despite the caveats. Having said that, I now say it’s crazy talk to produce something this potentially powerful, then shelve it. That’s like developing an armor-piercing laser, but scrapping it because it’s a smidgen inaccurate and only works a good chunk of the time. There’s still potential for making some big holes in stuff here!

I admit, I do possess some nerd genes and I’ve read through the boring stuff. That doesn’t mean YOU (yeah, you too, I suppose) should have to though, loyal Razzball readers. So what do we do with it then? For one, we can look at actual Ks from pitchers in 2011 vs. their expected Ks based on this formula. That ought to help tell us, in part, who was sandbaggin’ and who was overachievin’. I’ve arranged the numbers so positive is positive and negative is negative (fancy that). I’ve cherry picked players I wanted to highlight and to avoid some of the stat goofs. If you want to check out any others, you can sift through all the source data like I did. I’ve shown my work on a separate sheet, just like in math class.

This part is obligatory, really boring stuff. If you just want to get to the the T & A, skip this section. Just don’t ask questions that are answered here, because then you’ll be “that guy”.

A few players showed up one one set of source data and not the other, or repeated exactly within the same source data, so I’ve eliminated those.
Lists only include players who had 30+ IP for one specific team, not over several teams combined.
A few SP show up on the list multiple times due to having 30+ IP for multiple teams.
Data is split between SP and RP, so players should only be credited with stats for one role or the other per each list, respectively.
Some of the data is skewed by differences in pitch counts, spot starts by RP, relief appearances by SP, trades, and/or other statistical errors between sources.
References:
FanGraphs
Stat Corner
DRaysBay

If you’d like to peruse the data for your favorite players, check out the full document here and comment below with questions. Thanks for reading!

SP Sandbaggers (eK% / K% / K% Diff):

Randall Delgado – 18.0 / 12.2 / 5.8
He represents the biggest difference, postive or negative. Control remained an issue and he was pretty lucky with a .220 BABIP and 86.5 LOB%. However, if he can manage to tack down more first pitch strikes and harness some BBs, he could rein in even more upside. He and everybody else… If he ends up in the majors over Minor/Teheran, I certainly wouldn’t expect a Beachy-like season. I wouldn’t hold my breath for a Minor’s-minors-like season either.

Guillermo Moscoso – 17.9 / 13.9 / 4.0
He should have had enough Ks to place him slightly below the league average 7.13 K/9. However, he was fortunate hitters made enough bad contact (79.1 O-Contact%) to get themselves out when his control lapsed. There’s enough downside to spoil any upside, and the move to Coors won’t help. In the interest of manipulating time and space, let’s just pretend most of these schmohawks are invisible. Collmenter to Vargas: “Hey, can you believe that sh…”

Phil Hughes – 17.8 / 14.2 / 3.6
If you’re reading this (skimming counts), chances are you’ve been burned by Hughes at some point in your fantasy career. He dealt with injury and “hittability” last year, but showed flickers of that sweet, sweet flame. Sure there’s reason for concern, but he’s only 25 and here’s one more reason for optimism. There’s reasoning for ya. If he’s traded or somehow ends up Yanked back into the rotation despite the acquisitions of Pineda and Kuroda, keep the fire burning.

Shaun Marcum – 21.9 / 19.2 / 2.7
2.7% doesn’t sound like a whole lot, does it? However, at 200 IP and 823 batters faced, “U” should’ve been looking at about +20 K and +1 K/9. Yeah, Grey probably had good reason to like him so much. There isn’t much NOT to like about his numbers, so Marcum down for improvement.

Doug Fister – 17.4 / 14.8 / 2.6 (SEA), 20.8 / 20.9 / -0.1 (DET)
And here you thought Fister was fun to mention before? It looks like he had potential to get more guys to swing & miss, it just took the move to Detroit for him to capitalize more, and then some more on top of that. He’s likely to regress a little and his 17.2% slider usage might land him at the bottom fringe of Rudy’s Top 20 Risky Pitchers For 2012 list, but some of his stuff is legit. Welcome, fister+bottom+stuff Googlers!

Edinson Volquez – 23.8 / 21.3 / 2.5
Yes, he strikes dudes out. Yes, he walks them too. Yes, his consistent velocity and plate discipline, absurd 20.7 HR/FB%, and 1st inning ineptitude tell me some of his struggles were fluky. Yes, he will get less run support in SD. Yes, PETCO should help. Yes, he could pull a post-Dusty Harang-ment. Yes, I’m telling you to keep at least a lazy eye on him.

Jake Peavy – 21.6 / 19.3 / 2.3
Though some of his metrics looked like imperials and vice versa, there appears to be a millibigass (that’s a thousandth of a bigass) light at the end of the tunnel. But… and that’s a badonkadonkeykong-sized but… he needs to stay healthy long enough to get his conversion tables sorted out. Sometimes you don’t need standardized OR fanciful measurements to tell you what you should already know.

Jeremy Hellickson – 17.3 / 15.1 / 2.2
Bad news is, his K/9 was only 5.57. Good news is, it should have been about 6.4. “Wait, that’s good news?” Bad news is, his ERA/WHIP were artificially low. Good news is, the extra Ks should balance those out somewhat. “Some what?” Bad news is, he fits the risky pitcher bill. Bad news is, he’ll cost too much come draft time, regardless. “But…”. Yeah, I know good news was supposed to come next.

Danny Duffy – 20.6 / 18.4 / 2.2
On the other hand, I’m hoping this dude eventually ends up back in the rotation since he’s poised for a rebound. He’d been blowing everyone away up until his MLB debut (say that five times fast), and I don’t envision Duffman totally switching from blow to suck. Duff just didn’t trust his stuff. Know who else has had issues with nerves? His name rhymes with slinky… “Ohhh, yeahhh!”

Dan Haren – 22.0 / 20.1 / 1.9
Hairy Dan’s ratios got a little trim from a lower than normal HR/FB rate and BABIP, but his Ks should have been a little fuller. Ironically, his increased cutter use (+20.5% vs. 2010!) seems to be working, as his O-Swing% and O-Contact% go up as his Zone% goes down. All in all, y’all, he ought to retain comparable value. Did I get that “y’all” right, y’all?

Scott Baker – 24.2 / 22.3 / 1.9
Similarly, Baker’s Ks should’ve continued to rise while his ratios collapsed to an extent. Yep, even past his career high 8.22 K/9. It’s hard to put a finger on what exactly his secret ingredient was, but the measurements support it. He’s someone I would not sleep on in 2012, lest you get burned… or accused of assault. Don’t stand so close to me, space invader.

John Danks – 20.2 / 18.5 / 1.7
Danks refined his cutter to a lesser extent than Haren, but he also got more aggressive at pounding the zone and was actually a bit unlucky. There’s every reason to expect him to see both a bump in Ks AND a reduction in his ratios. It could have been more than a little if he’d been dealt, but Danks don’t stank.

There we have it, a Scott Baker dozen. There are about three times as many SP Sandbaggers than Overachievers (nope, no idea why and not too worried about it), so it’s time to move on before we get too bogged down in this shizzpile.

SP Overachievers (eK% / K% / K% Diff):

Clayton Kershaw – 24.7 / 27.2 / -2.7
Here I figured the opposite of Marcum was Mucram… CK won the CY, and deservedly so, but would he have won it with 23 less Ks and .9 less K/9? Probably. Just consider this gap, a smidgen of good fortune and his 25.5% slider use before you start wearing his cologne and get all reachy, reachy for him.

Zack Greinke – 26.1 / 28.1 / -2.0
Knocking his K/9 down to 9.8 from 10.5 isn’t a big deal in the context of a 7.96 career rate. He’s suffered bad luck from various sources the last two seasons, so there’s a chance his ratios rebound some too. However, his F-Strike% and Zone% dropped 2% and 7.5% during that time and a move to the NL. His stuff has bumped his O-Swing 5.1% to compensate, but reading between the percent signs, it might be more than nerves. My gut tells me not to invest too heavily for 2012. If you hear my gut too, hand me the Cracker Jack, will ya?

Ubaldo Jimenez -  – 20.3 / 22.2 / -1.9 (COL), 19.8 / 21.4 / -1.6 (CLE)
He was a bit less than fortunate both in COL and CLE, so his ratios should trend up. However, his velocity went down along with his GB%, F-Strike% and SwStr%. In short, I’m not expecting massive regrowth. Count on Big Jim too much and you could very well end up spending 2012 pulling out your hair, wondering “Why, Ubaldo?!”

Cliff Lee – 24.5 / 25.9 / -1.4
It’s like a freakin’ barbershop with all these cutters cropping up… er… down. The Adverb still would have bested his previous career high K/9 rate with about 10 less Ks and he’s capable of producing similar, though probab-Lee slight-Lee less spectacular numbers again. Of course, investing too much into last years numbers could easi-Lee end like another crusade for eternal youth; poor-Lee.

Next time, I’ll go over the relievers that should see an increase or decrease in Ks.  Until then, I will comb my mullet.

  1. ryan says:
    (link)

    Where do you get your pitch usage stats? Was looking for this a couple of weeks ago and couldn’t find it. Like i really need actual stats to tell me that Latos throws a lot of sliders……

  2. Chris says:
    (link)

    Good post, I enjoyed it. Your mullet is bountiful as well.

    This post will probably cause me to invest a bit more heavily in Marcum and Baker. Cheap price tag, solid WHIP, and even more Ks to come. I like it.

    • Jake

      Jake says:
      (link)

      @Chris, Thank you kindly. It’s sort of “hair nation” around these parts. Past performance is no guarantee of future success… or whatever the financial types say… but having this info in hand sure doesn’t hurt.

  3. dingbat says:
    (link)

    Fantastic post, all around. Thanks for doing this analysis and sharing it.

    • Jake

      Jake says:
      (link)

      @dingbat, Thank you, and you are welcome. I appreciate the feedback and that you got something out of it as well.

  4. BWC says:
    (link)

    Hi, thanks for the posts. Other times I’ve simply seen swinging strike percentage used to predict K rate. The issue I find when I’ve tried to use analysis like these is that although the regression formula does a good job explaining K% in that same year, it isn’t as predictive of the next year K-Rate as simply using the last years k-rate. I come to that conclusion 2 ways, a) running a regression of 2010 K% based simply on 2009 K% results is a 38% R(sqr) (so 2009 K% explained 38% of the variation in 2010 K%), 2009′s eK% explained only 36% (so despite the high correlation in the same year, it is more likely that the “peripheral” stats regress to the k-rate than vice versa. A second way is to use the differential between 2009 K% and 2009 eK% to predict the change in 2010. Running that regression results in a significant variable (P VALUE about 1%) but the R(sqr) is .037. So 3.7% of the variation explained. So it is telling you something, but it might not move the needle. I didn’t run for 2010 or other years, so perhaps 2009-2010 was just a bad year for this stat. Anyways, let me know if you think I’m missing something.

    • Jake

      Jake says:
      (link)

      Ah, a more statistically inclined mind than my own. There’s a lot going on in there, and in your comment too, so I’ll do my best to answer. From what I’ve read, those who post similar studies run similar regression anayses with variable variables and end up with variable results. This particular formula seemed to provide the most logical analysis to a layman AND the highest correlation. To me, this = good. Not to mention, which ironically means I am mentioning it, the model does include subsets of SwStr% as two parts of the equation. It sounds like you’re telling me you’ve run similar analyses and found that these correlations aren’t necessarily predictive from year to year. First, that sucks, squared. Second, it seems most similar studies never go so far as to look at year to year results. Finally, I can’t disagree with you. What I think you might be missing though, is all the other things in the blurbs about the particular pitchers I chose to mention, using this and other metrics as foundation for opinions. It’s admittedly not all scientific, and that’s the tongue-in-cheek/pinkie-to-mouth/feather-in-cap-called-macaroni point here anyway. All the analysis in the world doesn’t tell you what to do with it. I feel pretty confident in my opinions having gone through the mental exercise. I’d also like to think the O-word has worked pretty well for me in the past… which, again, is not guarantee of future results. If you don’t agree with my assessments, that’s cool. Hopefully, you at least had a chuckle or two, with me and/or at my expense.

      • BWC says:
        (link)

        @Jake, Hey Jake. Oh, the article was great, as with all the razzball articles, it was insightful, well written and funny. I’m amazed a group like grantland/cbs/yahoo/fox haven’t tried to just acquire razzball en masse since you blow most of their fantasy work away. I’ve just been obsessed trying to find a way to predict k-rate changes and this was my first chance to vent my frustrations :)

        • Jake

          Jake says:
          (link)

          @BWC, Thanks. I’m sure Grey appreciates that, as I appreciate his graciousness in letting me sit in for a couple sessions.

          I imagine it’s possible 2009-2010 was a one-year not-wonder, though only to a limited extent. Even so, what you’ve done is worth further pursuit since I don’t recall reading about anyone else looking into it. The DRaysBay formula was created using 2003-2008 stats so single-year variation shouldn’t be an issue there. I don’t claim to know entirely how the process of running regression analyses works, so I’ll have to take your word(s) for the results.

          Even if we can’t predict next year using the math, we can try to predict what should’ve happened within an individual year as accurately possible, then use that to make a better-educated guess for next year. Since this formula is good after 30 IP, another thing we CAN do is use early season stats to see if pitchers are improving or tanking. Then we can choose whether to act on that info. The trouble is having access to the data and picking points at which the data has been updated so it (mostly) jives between sites.

          Just out of curiosity, from where did you pull the stats you used?

          • BWC says:
            (link)

            @Jake, I’ll be going a little beyond my knitting in this post, since I’m not a stats professor but I think the issue is that you could start out with the reverse hypothesis, that 2009′s strikeout rate is highly correlated with “expected strikeouts” (which it would be) and you could make an equally valid claim that expected strikeouts is going to regress to the observed strikeouts. You don’t know which one is going to move unless you run some year to year regressions (or first half of season, second half of season). And every time I run those, the better predictor of future strikeouts is last years strikeouts with expected strikeouts not adding any additional explanatory power. This could be because of a number reasons, but two come to mind, either a) there are other ways to get strikeouts besides the variables the formula uses and those variables will consistently create the same variation between expected strikeout and actual strikeout or b) the underlying variables in expected strikeout are a result of being a strikeout pitcher and those variables will move in line with the norms predicted by the actual strikeout level (not vice versa, as the dray post suggests). To answer the actual question you asked me, instead of blathering on some more, I got the expected strikeout data from the dray board (the link you posted) and then pulled strikeout data from Fangraphs for 2010. Just to take up some more of your time, what I’ve been working on to try and predict strikeouts is to look at a per pitch data and see whether a strikeout pitcher is a) better at getting to 2 strikes or b) better at converting 2 strikes. So far it looks much more like B than A and that A isn’t even indicative of B. Anyways, I could talk about strikeout rates for ever. If you want the pitchfx database I put together, let me know and I’ll try to find a way to get it to you (it’s about 1 gig but has every pitch thrown from ’07 through ’11).

    • Jake

      Jake says:
      (link)

      @cockyphoenix, A full-grown mane, thank you. You’re referring to the Google Doc of the Excel spreadsheet I used? FG indicates stats I pulled from FanGraphs.com and SC are stats from StatCorner.com. The Cl in ClStr% is called strikes.

      Good article on Kershaw’s slider. It sure is nasty. He also bumped up its use almost 6% and its velocity 2.4 MPH last year. What if he goes to it even more because it IS so effective vs. righties? CK was 12th among 2011 qualified starters in slider usage. Only Ogando threw both more FB and SL than him. Al Albequerque’s FB/SL combo was effective last year too… It’s not an elbow’s death knell, but surely something to consider if you’re a fan.

  5. Whiskey Diet says:
    (link)

    Awesome stats, Jake. Really, really impressive post. Easy to read and funny, too.

    Only complaint – I gotta disagree with you on Greinke. His stuff is good enough like you said, but I really think he’s got his head back on. Super competitive guy and I think he’s just getting comfortable in the NL.

    • Jake

      Jake says:
      (link)

      @Whiskey Diet, Thanks, I really appreciate it. Understandable on Greinke, he’s probably the one about whom I’m speculatin’ most. My stake’s less about his mentality than his approach. He’s been effective while being less aggressive with the zone. Typically though (if I recall better than a crotchety 49er), F-Strike% tends tends to correlate with BBs. I’m supposing this means less margin for error to induce swings. His “luck stats” should normalize but if he Ks a few less, he’ll allow more BR anyway. Hitters will have had more time to adjust to him as well. I think he’ll end up with a bit better ERA, but similar WHIP and less Ks; all in all, comparable value to last year. Cust kayin’, I’m not banking on a major strike in profit.

  6. oh_dad55 says:
    (link)

    Good job.

    • Jake

      Jake says:
      (link)

      @oh_dad55, Thanks for the props, oh_pops.

Comments are closed.