Setting Optimal Lineups, 2015 Fantasy Baseball

What if I told you that with just the click of a button you could have your fantasy lineup set for you. Would you find this idea interesting? If you are 1) sometimes lazy in setting your lineup or 2) interested in the what the logic behind choosing the optimal lineup should be, then hopefully you answered yes to this question. In typical Greysian fashion there is no 3), but I imagine there are people that would fall into a third category as well.

If you find yourself curious about the technical details of how this is done, scroll down because I’m going to save that for the end of the article. What I’d like to jump right into instead, is the algorithm that will determine what the optimal lineup is for a fantasy team, because I think that’s the most interesting and nuanced part of this whole process.

The Logic

Let’s talk how this program is going to work. Before we can proceed though, you need to know what the league parameters are for the league I’m using to build this program, and also what information we will have available to us to make roster decisions like who to start or sit.

It’s unfortunate, but the truth is there is no standard format for fantasy leagues. Meaning that any program I build will be optimized for one specific type of league and only that type of league. RCL’s do provide a nice standardized format for decently large number of leagues, but sadly they are hosted on ESPN and my program is being built for Yahoo. As a result the league I’ll be using for this project is the Yahoo Friends and Family League, which has a reasonably standard setup.

Some details on its settings are:

14 teams, 5×5 rotisserie scoring, 1400 maximum IP cap
Rosters: C, 1B, 2B, 3B, SS, CI, MI, OF, OF, OF, OF, Util, Util, SP, RP, P, P, P, P, P, P, P, BN, BN, BN, DL, DL
Daily moves
Add/drops take effect the next day
Players lock once a game starts but you can shuffle a player in or out until game time
Dropped players enter Waivers, where they can be bid on from a $100 acquisition budget
Otherwise unowned players are simply free agents who can be added at any time

While my program will be optimized for this specific league-type, my hope is that what’s learned in optimizing this league will be translatable to other leagues as well.

The information we will use to make roster decisions is:

Daily projections for every player (provided by Rudy via Hittertron/Stream-nator)
Rest of season projections for every player (also provided by Rudy)
Basic game data like whether a player’s team has a game that day or not
Start/sit information that Yahoo provides with a ‘^’ symbol next to a player. This is only provided a few hours before a game starts, so how the ability to use it depends on how often the program is run throughout the day. I’m going to assume for now that I don’t know whether a player is actually in the lineup or not.

And that’s it, let’s start figuring out how to figure out the best lineup.

The first step I’d like to take is to say that the optimal total lineup is the sum of the optimal hitting lineup and the optimal pitching lineup.

This is a slightly simplifying assumption, because technically if you add a hitter you could drop a pitcher in response. To start though, I’d like to keep the hitting and pitching lineups as two independent processes, since this will make it easier to build a basic framework and isn’t terribly far from the truth anyways. One additional simplifying assumption I’d like to make for now, is to only worry about optimizing a lineup for a single day and not take into account expected future production.

You might be thinking with these two assumptions I’m over-simplifying things, but I’ll think you’ll see that even with these, the process will quickly get complicated enough.

One final step I’d like to take before going into the exact details of how this process will work is to talk strategy. It’s useful to understand the concepts that shape this task’s solution in order to arrive at the most elegant one.

Reminder, the goal is to determine the best lineup from a list of players. One solution that is guaranteed to work every time, is to simply compare all possible combinations of all players. Every single time this will point towards the optimal lineup, but it’s also an incredibly mindless and inefficient way at getting to the solution.

One strategy I believe is wise to employ is to first seek and execute the moves that, at the point of consideration, are relatively likely to occur regardless of future players considered. In other words, you want to avoid the situation where you have to keep track of lots of possible lineups since you never know how the effect of putting a new player in the lineup will cascade through the down the rest of the lineup. In the process I describe below, you’ll notice certain steps try to take advantage of this by using techniques like “pushing up” players and using positional restrictions to limit the number of best lineups needed to check against.

Without further ado, here is the process I’m proposing for selecting the optimal hitting lineup:

Step 1: Sum the Dollar value projections for each hitter from yesterday’s lineup to establish a baseline value. A common sub-routine in this process will be any future potential lineups will be compared to the current baseline. If a new baseline has a higher projected total value, it will become the new baseline.
Step 2: “Push up” the better projected hitters to more restrictive positions. What I mean by this is to compare, for example the players at 1B and CI (assuming your CI is 1B eligible) and move the CI to 1B if he has the better projection. If the CI is eligible at both 1B and 3B and has a higher projection than both the current 1B and 3Bman, then compare the 1B and 3B hitters and put the worse one at CI. Catcher is usually a relatively isolated position, but then also do this for 2B and SS with MI, and also make a ranking of your OFers from best to worst. The idea here is that your worst starting players should find their way into the Utility slots, which I believe will make it easier when trying to potentially add a new bat to the starting lineup; The reason being there will be less checking of whether you can shuffle players around to fit the new batter in the lineup.
Step 3: Check if any starting hitter does not have a game that day. If yes, then check to see if any bench hitters can fill the empty spot. I’m going to go with the assumption that it’s always better to fill a spot with a starting player rather than leave a spot blank. Looking at Rudy’s $ value projections on the Hitter-Tron page, I see that a fair number of batters actually have negative dollar values. Either I’ll rescale those values to be floored at 0, or I’ll calculate my own values using the projections for each stat category.
Step 4: If any changes were made in Step 3, once again “push up” the players in your starting lineup. My guess is it will be the case we’ll want to go through this process every time changes are made to the starting lineup.
Step 5: Here is where, finally, we’ll take a look at free agents and add any that improve our starting lineup. It is far from obvious what the best way to go about doing this is that will ensure we end up with the best possible lineup at the conclusion. Do we look at all free agent hitters or maybe something like just the top 50? Here’s my proposal:
- I think the best way to do this is to look at every free agent (or to be specific, every player I have projection for that’s in the Yahoo player universe) and keep track of who the “max” or “best” free agent is at each position. The process will then begin that the best FA Catcher will be compared first to the incumbent starting Catcher and second to the players at Utility. If the challenging Catcher is better than the Incumbent, the move of adding the new catcher can be executed immediately. On his way out, the freshly-deposed starting Catcher should be checked against the Utility players (or against the players at any other positions he might be eligible at) before being either benched or dropped. However if the case is that the challenging Catcher is only better than a hitter at Utility, rather than executing the move immediately, it should at that moment, only be noted as a potential move for the reason that later there might be a free agent first baseman or outfielder that is a superior option to fill the Utility slot. This process should be repeated for each position: next 1B…then 2B… and SS… and 3B… and also CI (where the challenging CI is the best remaining free agent eligible at 1B or 3B)… next MI… then OF (where if one OF is replaced, then the next best free agent OF is tested until no changes are made)… and then finally Utility. It might be the case that a large number of “tentative” potential lineups will have to be kept in memory during this process, with each one checked for each new potential move. Practically, I don’t think more than a few moves will made or even considered in a given day which should limit this concern, but I just wanted to note it as potential problem.
Step 6: After comparing all the “potential” best lineups, the best one out of those should be the overall optimal lineup. At this point whatever moves are required to create this lineup should be executed, and then, we’re done!

Taking a Step Back

I told you things would get complicated. Anyone who tried to skim through that process probably got nothing out of it and that’s fine, but I think the fun’s in thinking about the small details and strategies involved.

With that said, is what I described above the best way of doing this? Probably not. I haven’t tried coding or testing any of this yet. For all I know the process may be much simpler and I’ll come off as a maniacal lunatic for having even thought this up. Or maybe this process will be ambiguous for the case where every player on your team has multi-positional eligibility and it’ll have to be even more nuanced.

This is part where I invite your feedback if you’ve followed along thus far.

Other Uses

Besides being used just to set an optimal lineup, I believe there are other potential uses here. One idea is to take advantage of the fact that once you have an automatic process for determining the optimal lineup, you can then use it to calculate for each member of your league, how close or far their actual lineups are to the optimal. It’d be interesting to test if this metric has a strong correlation to final standings, which would be a new way of quantifying the importance of in-season management in fantasy leagues.

That’s just one idea, I’d like to think we could come up with a few more as well.

Technical Details

To be honest, I’m exhausted from typing the above out to go into great detail about how the technical aspects work, but I also won’t leave you empty handed. Click here to see the code I’m using on github. Though I’ll admit it’s far where I want it to be at the moment. I’m trying to implement more coding best practices in my repertoire like testing the program in a controlled virtual environment and keeping older versions of the code available in case I was to revert back from a change. Part of this is getting used to processes that take a little time getting used to in order to use them effectively. My code is presented in a poorly-structured manner as a result.

The basic functions that my code employs are 1) getting information from Yahoo about teams and players in a structured format and 2) sending instructions back to Yahoo that cause actual changes to my fantasy team.

With these two abilities, the general process is to first pull in information about the leagues teams and players, then second to use that information to determine the best moves, and then finally to communicate these moves back to Yahoo in a standardized format that the Yahoo API can take and use to alter my team (without me physically doing the work).

For example, here’s a snippet of code that shows a function called Put, which is used to generate the instructions for a roster move to be sent back to Yahoo. It takes five inputs as parameters and uses those to craft the specific protocol unique for a given two players. The first parameter is the web address that will be used to contact Yahoo’s API. The remaining four are the two players involved in the move and their desired new positions (usually the Bench for player 2). The text in the variable xmlString is what is literally sent to Yahoo to describe the move we want it to execute. This is apparently a strange way of doing this, but it works, which is enough.

That’s all for today. Next time we’ll continue to iron out the algorithm for determining the best moves, and also take a look at what a code for this looks like.

If you don’t already, consider following Paul on Twitter @polarizedranger.