Hello readers of the Razz! It’s been a long winter and I have some strange ideas floating around my mind, so I’d like to start things off with a little guessing game.

If I have your permission, I’d like to presuppose that at one point in your fantasy baseball career, probably near the start, you had a dream that you were better than everyone else at predicting player performance. Maybe not for every single  player but you at least had a few players, your guys, who you thought would have a big year. It was based on only hunches, but you were a confident, naive little soul.

Fast forward after a year or two of mediocre success and of reading The Book Blog, and next thing you know, all that personal confidence is gone and you shill around $40 every March to have some RotoManiacDude do the predicting for you.

Kind of a humiliating defeat if you ask me.

No, I’m not calling out everyone to get off their mental ass and begin working on a projection system. Here’s the code for the simplest one and it’s complicated enough already.

What I do want people to realize, is that a part of fantasy baseball that use to involve gutsy calls and artful player evaluation has slowly been drained of that element by “stat heads” and their projection systems. The romantic among us yearn for those lost days of one man’s imperfect wit against another’s. The more logical read the phrase “artful player evaluation” and replace the first word with “biased” or “inaccurate”.

Sad as it was in the case of player prediction, there is still room for personal decision making in other aspects of the game. In certain leagues there are daily roster decisions of who to sit and who to start, and who to pickup, who to drop.

But then came Hitter-Tron and Stream-o-Nator, and a whole host of other features on other sites that accomplished the same goal: automate a portion of a person’s fantasy baseball algorithm.

The effect is to take some of the decision making away from the ill-informed, biased fantasy player and replace it with a program designed to make the same decision but with less error. Although I’m using a slightly ominous tone, it doesn’t sound so bad, does it?

The question is, where does it stop? Player evaluation is long gone. Drafting players via snake or auction is becoming automized by “Draft Software” and “War Rooms”. I already brought up how setting your roster is increasingly automated. People complain that we take away the human element of players when we reduce them to stats on our fictional squads, but we’re simultaneously taking away the human element of ourselves from the game of fantasy baseball by relying on computer predictions for all our decisions.

So what do I, your cherished author, have to say about all of this? I say screw it, let’s quit pussyfooting around things. Let’s automate everything. Let’s take the entire process–from the day of the draft to the final day of the season–and make one giant algorithm that behaves like we do. Perhaps we’ll even make something that performs better than we do. The sort of self-analyzation that occurs when one automates a process often leads to improvements in the process, if only for consistency’s sake. (The algorithm won’t smoke weed and forget to set its lineup, for example.)

I have a slight head start on you in thinking about some of the difficulties that will arise in this endeavor. Unlike other games such as chess and poker (for which there has been no lack of effort by people into designing programs to play them optimally, or in the case of poker, at least profitably) fantasy baseball has no distinct rulebook. This poses a serious problem if we want our algorithm be able to handle every possible fantasy baseball position it is placed in. It should have a move for every potential context. While that’s the ideal, we’ll have to start off with a slightly narrower scope.

Computers are best at performing highly repetitive tasks, so it makes sense to start off by thinking of an algorithm for daily-move leagues (not daily Fanduel-type leagues, algorithms already exist for those). In the same way that Alan Turing looked at mathematical computation phenomenologically to hypothesize the Turing Machine and eventually build one of the first universal computing machines, I want to note step-by-step exactly what I’m doing when I “play” fantasy baseball. Everything from the buttons I click, to the thoughts going through my head, to the text I read is recorded and considered for possible inclusion in the algorithm.

Avoiding getting bogged down in details (which I’m more than happy to get into discussions about in the comments!), there are areas I see this going smoothly and areas I see a program struggling. To give one example: trade negotiations. It shouldn’t be very hard to automate the process of evaluating a trade, but the actual negotiations leading up to the offer? Very difficult to strip the human element from that.

There are literally a million other considerations in my head right now, but I won’t let that paralyze me from getting the base of the process down first, and worry about fine-tuning in a step-by-step process later. Thinking grandly, my hope is enough people will think this algorithm an interesting enough idea, and it will compete in a Tout Wars league someday (I’ll offer up my spot).

I have a fair amount of coding knowledge to gain to get to the point of building an actual program, but this is my project for the 2015 season. I’m excited to get started taking the fun out of fantasy baseball, what’s left of it anyway.

 
  1. Grey

    Grey says:
    (link)

    On Line 467: createTuple always gets me!

    • Jay

      Jay says:
      (link)

      @Grey: you saying createTuple always *gets* me.

  2. Connor Behnen says:
    (link)

    Will you walk us through even the most basic of your steps? i.e. you said you have a fair amount of coding knowledge to gain before building the program. How do you plan on gaining said knowledge?

    • paul

      paul says:
      (link)

      @Connor Behnen: This was great example of the extent of my current coding knowledge. /hyperlink fail

  3. Mike says:
    (link)

    I am in a 20 team auction, with 30 players drafted per team. I am waiting on pitching, what other strategy should I have.

  4. Andrew D says:
    (link)

    whos Jeff Sackmann?

  5. J-FOH says:
    (link)

    I hope ghost town steve comes in and chats with you. This is right up his alley.

    I have been working on a theory too. Years ago someone in the comments argued that this game is a science and can be figured out with numbers and nothing more. Its been stated by other players that there is no luck in this game. I also disagree with that statement because what I view as luck they view as just being unlucky. To put more simply in a statty stat fantasy baseball nerds mind, there is no luck in fantasy baseball just seasons of being unlucky. An injury to a player with no injury history is a fluke just as an injury prone player staying healthy is a fluke. Is it? Is there a statistical probability formula that can be applied to predict if, when and what frequency a player can get hurt? Someone recently noted a very successful NFBC player watches a ton of baseball and via observations and his keen eye has turned that in to success? I have no problem with using your gut a little, just feed that damn thing some numbers first to keep it in check. Now what was my point? i don’t know, just random drivel I guess

    • Nico says:
      (link)

      @J-FOH: Moneyball 2.0: Razzball Edition, starring Grey as Billy Beane, Rudy as Peter Brand, and J-FOH as Art Howe.

      • J-FOH says:
        (link)

        @Nico: I want to be Jonah Hills glasses

    • paul

      paul says:
      (link)

      @J-FOH: My opinion now is there’s too much randomness in baseball player production to say fantasy baseball has been ‘solved’ or there’s no luck in the game. We’ll see if that changes in the future.

      • J-FOH says:
        (link)

        @paul: there is tons of luck in fantasy baseballl…um I mean unluckyness

      • McNulty says:
        (link)

        @paul:

        a good computer program would definitely beat you (anyone) over the course of a million seasons. But , we only live in one universe, so even a blind squirrel can find a nut sometimes.

        Yes, the probability of a freak accident occurring *can* be calculated. Factors such as the coordination of teammates, the frequency that a player likes to go skydiving, etc. Is the probability of a freak accident happening to player A vs. player B significant? No, but I think a model could accurately account for this

  6. GhostTownSteve says:
    (link)

    I proposed this in these very Razzball forums a few years ago. I actually had a brief email exchange with Rudy trying to get him interested in the idea but he was like “you crazy man.”

    So I’ve been thinking of this for a pretty long time now. Here are some of my thoughts.

    First, if you’re talking creating artificial intelligence like neural net style computing so you have a kind of all in one fantasy playing machine then you should probably turn your talents toward a more significant, humanitarian endeavor. In other words, I think you’re going to have to modularize it. The code and the inputs for managing a draft are going to be a lot different than managing in season. You’ll need to strap these different modules together.

    As you say, the forecasting models probably can’t be improved upon in this venue. You’ll probably just want to choose a forecast you like or take a composite for you data input.

    Draft is definitely a big topic. The computer is good at brute force. The chess analogy is good here. An expert chess player looks at a board and based on experience examines just one or two lines of play. A computer looks at millions of positions to find the best line of play. I think this would be how the computer would handle the draft. Like the guys did with Deep Blue against Kasparov you’d have to give it a head start by providing some basic strategic framework. Not sure if you have this skill set, but ideally you’d have a game theory model that would power the draft analysis. At every point in a draft (auction would be a similar process but different code) the computer would have an optimal play. If it was really, really smart it might be able to detect competitor tendencies and styles in addition to roster construction and stat array at each point in the draft to try and better predict. So while there is some draft software that recalibrates during drafts and presents you with options, I think it’s based on just pushing forward the highest $ value picks at that point and forcing you to manage the strategic framework and the roster construction.

    Daily league streaming should just leverage existing DFS algorithms.

    I think there are two things that a computer could do very well. Much better than a person and that’s create models for in season tracking. Especially from the middle of the season on. Calculating stat category run rates for each team and forecasting to inform line up decisions and pick ups and drops. The computer should be very good as seeing where you can easily make up ground, hold your ground and lose ground based on probable outcomes and regressions in season. It might also be good at spotting waiver pick ups by noting trends in playing time or any other data points by just being able to keep track of every player’s daily performance and have some analytics.

    The other thing it could do well is data mine the internet. There have been good results in stock picking by simply tracking the number of mentions, positive mentions and negative mentions on the internet and correlating them to stock performance. The Cleveland Indians actually has a system that does this very thing for every player in organized baseball anywhere. This would be another way of trying to spot and predict up/down trends.

    • The Great Knoche says:
      (link)

      @GhostTownSteve: Somebody call John Connor, and cue Arnold.

    • Nico says:
      (link)

      @GhostTownSteve: Man, do I wish I had your technical ability for a few weeks. The in-season and DFS aspects of this really intrigue me. Unfortunately, I have no programming knowledge. My above average excel skills get me by for the simple things, but not in the way I’d like. I’d love to implement some sort of goal and waiver tracking tailored to specific leagues, as well as data mine for DFS purposes. I know what kind of learning curve that requires, however. Great post.

      • Cram It says:
        (link)

        @GhostTownSteve: @Nico: I was mock drafting on Fantasypros, in preparation for an NFBC draft. The whole draft is AI except my pick. But when it’s your pick, they have a queue of recommended players for your pick. It provides consensus %’s and optimal points increase in the standings at that moment. I don’t know, thought it was pretty neat and helpful. They clearly have the data mining aspect of at least experts rankings.

        • GhostTownSteve says:
          (link)

          @Cram It:

          Fantasy Pros doesn’t really do data mining. They just aggregate rankings and projections into a consensus. I used to like the consensus ranks but I’ve kind of grown to prefer an individually curated set that I trust. All consensus does is add weight to the group think, name brand mentality of most ranking sites. I do prefer some curation though and I think it’s been proven out that some human analysis outperforms the strictly algorithmic like Steamer et al.

    • paul

      paul says:
      (link)

      @GhostTownSteve: These are interesting thoughts, thanks for sharing. If it makes you feel better, Rudy wasn’t sold on this idea when I pitched it either (and for the record I think his concerns are wholly legitimate). I’m choosing to start this project very aware that it’s a risk, but at the very least, I hope the attempt is worthwhile.

      Of course it’s too ambitious a project to think about in one big chunk, I’ll build it piece-by-piece starting with whatever I think I’m most capable of doing. Data mining/scraping is something I have experience so I fully plan to leverage that capability in my bot. I look forward to any future input you give as I progress.

      Apparently you have envious technical abilities, I’d be interested to see the work you’ve done.

      • DrEasy says:
        (link)

        @paul: There’s nothing wrong in trying, and I think the biggest initial step would be to simply come up with a baseline robot that follows some naive algorithm for every decision. That would take care of the “plumbing”. Then it’s “just” a matter of having people plug in better algorithms for every decision point and compete against each other.

        A good starting point would be to have passive robots that just draft and then sit on their virtual asses. Next we could have robots that systematically go through the waivers during the season and try to improve their 5-cat situation until they reach pareto-optimality. The hard part here again is the plumbing: is there any API that one can use to pick up or dump players? If not, we would need to analyze the URLs or do some traffic sniffing to reverse engineer the protocol. That’s the boring stuff I’ll leave for someone else….

        I wouldn’t worry about the trading aspects. I usually do OK in my leagues and I almost never make a trade. A trading robot would also probably be quite insufferably chatty.

    • M says:
      (link)

      @GhostTownSteve: oh no, not a DeepBlue of fantasy baseball! Haha l

  7. slimbo says:
    (link)

    1 year league

    Just got offered my fielder for Chapman….pPull the trigger?

    • paul

      paul says:
      (link)

      @slimbo: Depends on other settings, but I say hang onto your hitting.

    • Mike says:
      (link)

      @slimbo: I would never draft Fielder so I would take Chapman

  8. weas says:
    (link)

    I would love to see more posts as you progress, I love solving fantasy baseball problems with coding — Last summer I created an algorithm which generates near-optimal DraftKings lineups based on daily DFSBot projections. I’d love to contribute to a project like this but I only know the programming side, not too much on the statistics/analytics side.

    • paul

      paul says:
      (link)

      @weas: That’s dope. I hope to keep your interest throughout the season… I’m quite confident I’ll hit stumbling blocks in the coding along the way and could benefit from having someone with more expertise around.

  9. Wallpaper Paterson says:
    (link)

    After reading this, I’m not sure I like fantasy baseball anymore.

  10. Matt says:
    (link)

    This will inevitably lead to 2021, a drafting odyssey:

    “I’ll take Kershaw at #3 overall”

    “I’m sorry Dave, I’m afraid I can’t let you do that.”

    • paul

      paul says:
      (link)

      @Matt: Hahahahaha

  11. ginardo napoli says:
    (link)

    Honestly, I have won every single league I’ve been in (including 12 team and 10 team leagues) and I find projections to be a waste of time. Some players have a good chance of meeting or exceeding expectations, others have at least some chance of falling from grace or succombing to injury — these are largely not predictable events

    Every year the top 20 players in every position is not same from year to year, and most years the rankings are completely different, with half of the list either new or players who were outside of the top 30. Not only that, but the players you have in late August that you ride to victory, are rarely more than half of the players you had in early May. You have to make judgements and take risks all the time. Ranking your players, or making some sort of Tier list is a much better and more flexible usage of time. I mean what correlations are we discussing about with these projections? Less than 0.80 — which is 64% Coefficient of determination, ie. at least 36% of the variation from projections cannot be explained. And this is for the “predictable” set of players. What was the projection of Danny Santana or Francisco Rodriguez? Exactly.

    If you want to look at numbers, It’s better to focus on a few numbers — like K per IP, K to Walk Ratio, Team Defense, the ISO, Team Offensive potential, and a hitters 2 strike BA. So for instance, I look at the top 10 pitchers with the largest number of strikeouts at the end of spring training. At least one of them will be a surprise pitcher — which one might that be? This is where judgement comes into play, and you can’t quantify judgement. You can use numbers to support whatever judgement you make, but numbers cannot make that judgement for you.

    Now don’t get me wrong, we love the statistics because we love the numbers, but numbers are not players. Falling in love with projections merely blinds you to the bigger picture I think. I never ever use them, and I’ve been quite successful.

    • paul

      paul says:
      (link)

      @ginardo napoli: You say “This is where judgement comes into play, and you can’t formalize judgement.”

      Like it or not, I believe that whatever process it may be that your brain goes through to make its judgement decisions of which surprise pitcher is the best gamble, it’s a process that could be modeled by a series of logical statements, typed into a computer program, and run on a machine.

      You’re either picking totally randomly or you have a process. It might be a process so full of complexities and subtleties that even you’d have a hard time articulating it… but I believe there’s a definable, repeatable process in there.

      • Bull in a Chinese Restaurant says:
        (link)

        @paul: sounds quite positivistic here; you might not have much familiarity with Caro’s loose wiring:
        “If choices are not clearly connected to their benefits, people usually interact in ways that make outcomes unpredictable.
        If choices are clearly connected to their benefits, people sometimes act in ways that make outcomes unpredictable.”

      • GhostTownSteve says:
        (link)

        @paul:

        I think what we call intuition is a double edged sword. Kahneman the economist has said and shown that algorithms outperform expert opinion in many cases, due largely to cognitive biases. However, in messy, bumpy data situations it may be possible that intuition outperforms algorithm because there may be qualitative inputs that are not available to the algorithm. As Jack mentions above, there is something to be said for being your own scout and watching baseball.

        MLB teams themselves are constantly tinkering for the right alchemy between scouting and data. It’s pretty widely accepted that some blend of qualitative and quantitative yields the best results. Kasparov still contends that Deep Blue had human intervention at key points. That he sensed human intelligence behind some moves. IBM denied of course.

        Fascinating stuff.

        • And Now the John Lovitz Dancers! says:
          (link)

          @GhostTownSteve: kahneman is mentioned many times in Taleb’s books. I didn’t know Kasperov thought that about Deep Blue.

    • CM52 says:
      (link)

      @ginardo napoli:

      Won every league you’ve been in? Yeah, you’re clearly not playing in competitive leagues.

  12. ginardo napoli says:
    (link)

    I thought I proved the limitations of the hitter-a-tron last year to you fellahs. Granted, it is a coder’s wet dream, but the data is so immense, and capricious on a day to day basis — Mike Trout strikeout twice and went 0 for 5 against who? — that you really only should use it as a means of having a collection of organized information to make judgements.

    If you want the algorithm to think for you then you are only maximizing your intransigence and minimizing your resistance to instinctive judgements. You focus so much on the minutia of numbers that you forget about the dynamics of the people who actually create the numbers.

  13. steve says:
    (link)

    12 team league. 11 starts per week. can only use 60 transactions a year so streaming is limited. Will roster 11 starters. 10 pts for pitching win. All pitching goes very early. Approximately half of the razz ball top 40 pitchers are being kept. I’m keeping Garret Richards in the 26th. 26 rd draft.

    I can keep either Gausman in the 25th or can trade to keep Felix in the second. Will cost me a 18th rd pick. If I keep Gausman, I likely end up with Greinke or Zimmerman in the spot I’d keep Felix.

  14. ginardo napoli says:
    (link)

    For instance, Mike Trout hit .287 last year. Hit averages over the 6 months of baseball are as follows: April .321, May .263, June .361, July .265, August .254, Sept .274. Ignoring whether we should treat this data set as a Binomial (based upon hit or not a hit criteria) — and ignoring the obvious skew (Mike hit less than .270 for 3 out of 6, and only more tun .300 for 2 months) — we can determine what the percent of likely occurrence for an 0 for 5 would be a (1-.287)^5 = .184.

    That means 18.4% of the time. Multiply this by 162 and you get 29.85. So About 30 games. Which games are those? Are they always against the tough pitchers? Nope.

    If we assume this is a normal distribution, we can roughly calculate the standard deviation from the months and look for the percentage of time we should expect the percent to be less than .200 (1 out of 5).

    Using the above monthly stats, I calculated the standard deviation as .0385. So with the average as .287, a z-score for .200 would be -2.259. What is percent of time less than this z-score, About 1.19%, which multiplied to 162 means about 2 games. Obviously not realistic when you go back and look at the game log data for 2014 — Mike had 8 zero hit games in the month of April alone.

    See what I mean by getting blinded by analytics.

    • Hawk says:
      (link)

      @ginardo napoli:

      How accurate do predictions need to be in order to be valuable?

      For example, Grey has Mike Trout projected for 33 HR

      If he hits 30 home runs – 90% accurate – you’d probably consider that a reasonably good projection, right?

      In my 10 team, 5×5 full season roto league last year the team in the middle of the pack was projected for 256 HR. If all of the projections were equally accurate – 90% accurate – to the one about Trout above, that 256 HR could have been landed a team anywhere between 234 – 281 HR. In my league that means the 5th place team HR projections could have been anywhere from 2nd place to 8th place in HR. Out of 10 teams.

      And that’s assuming across-the-board 90% accuracy!

      I used to think fantasy baseball was quantifiable. I don’t anymore. Not as a whole, anyway. There are too many variables.

      • paul

        paul says:
        (link)

        @Hawk: Hawk, your concern is legitimate as far as I’m concerned. As it would relate to an algorithm, it would mean I’d have a pretty boring one on my hands, one that might simply choose it’s optimal strategy in 99.9% of situations. It’s when the league context could alter what the optimal strategy is that things get interesting. Otherwise I’ll just have a slightly glorified data scraper and lineup setter. I don’t know yet what percent of the time the league context will influence the output of the program, but towards the end of the season I believe it’ll be often enough to make this endeavor worthwhile.

      • ginardo napoli says:
        (link)

        @Hawk: I’m just saying. Life is short brother. Use the gift called your intuition and save yourself the false sense of security that numbers give you….Mike Trout’s a good player. Who cares what his end numbers are? Ditto and repeat. You think musicians memorize long beautiful complicated pieces by memorizing numbers. Hell no. They do it by feel.

        I think I’ve made my point. Have a great season everyone. I know I speak for all when I say I can’t wait till the first Spring Training game. :-)

    • GhostTownSteve says:
      (link)

      @ginardo napoli:

      It seems to me the objective isn’t to right always but right more often. The DFS algos basically cross reference probabilities versus cost and spit out optimal line up combinations.

      With a player like MIke Trout it doesn’t really matter what the distribution looks like because you are of course going to play him every day in year long fantasy to take advantage of what you assume the end result will be. In year long you’d just be looking for match up or platoon splits that would give you percentage points advantage. Doesn’t mean you’d be right every time. Just means you’d be playing with some mathematical odds advantage. Just like you can go to the craps table and sometimes win because of sample size, but you know in the long run the house (odds advantage) will always win.

      Of course baseball is less predictable and you’re always going to be playing on a limited sample size in fantasy so the question is there any advantage to be gained by playing according to the best odds baseball math can offer, or is intuition better? I think the former is probably the better path. Eliminates all the cognitive biases.

      • ginardo napoli says:
        (link)

        @GhostTownSteve: Excellent point Steve. I really enjoyed you making a good point. I agree.

        My original point was really about not wasting time doing projections, and then well I got all caught up in the mathematics. Math is like sex to me, so once I get started, I can’t stop until after I pound the numbers for a good long time. I just hope none of my jizz got on you fellahs. I apologize if that happened. :-) :-) :-)

        LOL … good times

  15. ginardo napoli says:
    (link)

    Okay, I realize I’m giving up the goose, but I only buy on top notch pricey players like Trout, McCutchen, Cabrera when the likelihood of an 0-fer is very small — for example: such as after a streak of 3 lousy days. There are various combinations that I use, but you can go back and do the data analysis yourself.

    I love the site by the way. You guys are awesome.

  16. ginardo napoli says:
    (link)

    With Mike Trout example, I realize I forgot the Poisson rate model.

    The rate per 1,000 is 287, so converting that rate to a rate per 5 we get 1.435. The Poisson expectation of zero hits out of 5 at bats will occur 23.8 % of the time … or 38.556 games out of 162.

    Last year there werer 47 games in which Mike recorded zero hits. Some of them he still got walks and runs, so this means that the Poisson model effectively models this event very well. And that meand the 0-fers are random mostly-unpredicable events.

    Except that if you actually look at the data, most of the zeros occurring before and after a period of about 8 or 9 weeks when Mike was hot as hell. Hmmm….

  17. Joe says:
    (link)

    ginardo just went beautiful mind on us.

  18. The Harrow says:
    (link)

    good stuff paul.
    “make the cock suckers glad to mutate.”
    – Roosevelt after inauguration

  19. Kevin says:
    (link)

    I’ll throw my hat into helping. Programming Professional (Comp Sci Majaor) (3ish years exp)
    I’ve been looking to get into an Open Source Project for a while and I can’t see a more interesting one than a baseball one.

    • paul

      paul says:
      (link)

      @Kevin: Welcome aboard, new friend.

      • Kevin says:
        (link)

        @paul:

        Great.
        Sorry if you answered this is an earlier comment; but whats the process here for this project?

        • paul

          paul says:
          (link)

          @Kevin: The process is largely undefined at this point, but it’ll probably go something like:

          1. Scrape and store league data and player projection data
          2. Ignoring other teams, get a base program working that maximizes stats for my team
          3. Figure out how to get the code to then implement the optimal plan in a Yahoo or ESPN league
          3. Then add features like a trade evaluator, or a conservative factor (if in first), whatever ideas we think of.

          Personally, I’m busy with other stuff til about mid-March, so progress might seem slow until then. I’m going to post weekly articles on my progress, but they’re still going to be introductory stuff for the next few weeks.

          If you’re ready to try building a part of it, don’t let me slow you down.

          • Kevin says:
            (link)

            @paul:

            Mid Marchish timeline works well for me, I’m actually switching jobs soon so that would work better for me I believe. I guess keep my info and let me know if I can do anything

Comments are closed.