I thought a fair amount about what the topic should be of my second article, and while I would have been happy to continue waxing theoretically about what a fantasy baseball bot might look like, I figure people probably want to see some actual code or at the very least pseudo-code. You’ll settle for some pseudo-code, right? Great.

Order of business No. 1 in this venture is data-scraping. What will we want to scrape? Let’s start with a league homepage:

Screen Shot 2015-02-28 at 1.50.19 PM

 

For ESPN I’ll admit, surprisingly minimalist. I guess my AdBlock is working after all. We got the league name (potentially useful), some crappy articles (less useful), congrats to Rudy on finishing first, and ah, the Standings. Let’s expand those:

Screen Shot 2015-02-28 at 2.11.06 PM

What a beautiful page! Basically all the information on this page should be scraped daily, starting with a list of of the team names.

Let me be clear here. By list I don’t mean something your wife gives you before heading to the grocery store; in Python lists are a specific type of container for data storage with unique syntax and properties. As it stands they are a relatively simple container.

(I don’t plan on going into too much Python syntactical minutiae, but for this first example, I’ll try it out.)

Lists

First off, to define a list, you use square brackets [ ]. For example:

teamNamesList = [] # creates an empty list named teamNamesList

After scraping from the standings page, we’ll hopefully have something that looks like this:

teamNamesList = ['Razzball Rudy', 'Team Albright', 'Yahoo Del Don', 'RC BradGraphs',
'Razzball Scott', 'Team McLeod', 'Razzball Radio', 'Razzball Sky Sperling',
'Razzball Jay(Wrong)', 'Club Singman', 'Tehol the Pathetic', 'Mastersball Carey']

A couples things you can do with this list. If I wanted to select a specific team, I’d use it’s index number, which for these 12 teams is a sequence from 0 to 11. For example:

print teamNamesList[0]

would spit out:

'Razzball Rudy'

and,

print teamNamesList[11]

would result in 'Mastersball Carey' being printed to the terminal, as it’s called.

Another example of something I could do with the teamNames list is create a variable called numTeams and set it equal to the length of teamNamesList. In Python that would look like this:

numTeams = len(teamNamesList)

print numTeams

12

While this may seem like a nice feature, practically speaking, it is probably best to have a human being input the number of teams directly into the program as a parameter. Until leagues start kicking owners out in the middle of the season (not a terrible idea), there’s no need to recalculate the number of teams on a daily basis.

If you’ve followed me this far, let me bring up a more interesting scenario: what should happen when a team changes its name?

One solution is to ban all teams from ever changing their names, but that’s neither fun nor realistic. Another option is to simply have your teamName list change as well, but that could make any historical variable confusing to interpret, and also mess up the entire program depending on how it’s coded. Basically, we need to find a good way to deal with this scenario.

Here’s where I will introduce another kind of data container: the set.

Sets

The main features of a list are that it is ordered and items in a list can only be referenced by numerical indexing (Remember teamNamesList[0] = ‘Razzball Rudy’)

Sets  are different in that they are unordered and they don’t have to be reference by a number. The benefits of these two features will be made apparent when dealing with the problem of a team name change.

First, we convert the teamNames List into a teamNames Set:

teamNamesList = [‘Razzball Rudy’, ‘Team Albright’, ‘Yahoo Del Don’, ‘RC BradGraphs’,
‘Razzball Scott’, ‘Team McLeod’, ‘Razzball Radio’, ‘Razzball Sky Sperling’,
‘Razzball Jay(Wrong)’, ‘Club Singman’, ‘Tehol the Pathetic’, ‘Mastersball Carey’]

teamNamesSet = set(teamNamesList)

What converting to a set allows us to do is compare the elements of the teamNamesSet to another set, regardless of the order. This is where the unordered aspect of sets comes in handy.

Let’s say we have the set of the team names from yesterday called permTeamNamesSet. permTeamNamesSet will look something like this:

print permTeamNamesSet

>>> set([‘Razzball Radio’, ‘Yahoo Del Don’, ‘Razzball Rudy’, ‘Mastersball Carey’, ‘Razzball Sky Sperling’, ‘Tehol the Pathetic’, ‘Club Singman’, ‘Razzball Jay(Wrong)’, ‘Team McLeod’, ‘Razzball Scott’, ‘RC BradGraphs’, ‘Team Albright’])

[From now on what I put after ‘>>>’ will be what Python prints to the terminal]

This you’ll notice, is exactly the same as the teamNames Set. So, if we were to take the difference of the two sets, we’d be returned with an empty set.

x = teamNamesSet.difference(permTeamNamesSet)

print x

>>> set([])

Now however, what if Mr. Carey the owner of Mastersball Carey, decides he’s unhappy in last place and hands over control of his team to Yahoo’s Andy Behrens? The first thing Mr. Behrens would probably do is change his team name to ‘Yahoo Behrens’. What will our set difference look like then?

teamNamesSet = {'Razzball Rudy', 'Team Albright', 'Yahoo Del Don', 'RC BradGraphs',
'Razzball Scott', 'Team McLeod', 'Razzball Radio', 'Razzball Sky Sperling',
'Razzball Jay(Wrong)', 'Club Singman', 'Tehol the Pathetic', 'Yahoo Behrens'}

x = teamNamesSet.difference(permTeamSet)

print x

>>> set(['Yahoo Behrens'])

Pretty simple, this time the .difference() function identified that the set teamNamesSet has a new element, ‘Yahoo Behrens’, that is not in the set named permTeamSet.

Taking the difference the other way around, leads Python to identify ‘Mastersball Carey’ as the team no longer in the set of team names.

y = permTeamSet.difference(teamNamesSet)

print y

>>> set(['Mastersball Carey'])

Wonderful. So what we want to do is add ‘Yahoo Behrens’ and also remove ‘Mastersball Carey’ to the permTeamSet. This can be accomplished with the following code:

# adds 'Yahoo Behrens'
permTeamSet.update(teamSet)

print permTeamSet

>>> set(['Razzball Scott', 'Razzball Rudy', 'Yahoo Del Don', 'Club Singman', 'Razzball Jay(Wrong)', 'RC BradGraphs', 'Mastersball Carey', 'Razzball Radio', 'Razzball Sky Sperling', 'Tehol the Pathetic', 'Yahoo Behrens', 'Team McLeod', 'Team Albright'])

# deletes Mastersball Carey
permTeamSet.difference_update(y)

print permTeamSet

>>> set(['Razzball Scott', 'Razzball Rudy', 'Yahoo Del Don', 'Club Singman', 'Razzball Jay(Wrong)', 'RC BradGraphs', 'Razzball Radio', 'Razzball Sky Sperling', 'Tehol the Pathetic', 'Yahoo Behrens', 'Team McLeod', 'Team Albright'])

Sweet, Mastersball Carey is no longer in there, and Yahoo Behrens is so we’ve updated our permTeamSet to reflect the team name change. While that’s great, what still want to do is have something linked to the team names that can remain constant, even when the team name itself changes. This is where the Dictionary comes into play.

Dictionaries

Basically, Dictionaries map one thing to another, or in more official terms, a key to a value. Here’s an example:

dict1 = {'a': 'duck', '2': 'Razzball', 'chimpanzee': 'How's your mother?'}

This dictionary or dict maps a to duck, 2 to Razzball, and chimpanzee to How’s your mother?

This may seem pointless, this mapping of two objects, but it’s exactly what we need to not have the program affected by a name change.

At the moment, a dictionary of the team names in this league would look like this:

teamDict = {'Razzball Rudy': 'Rudy', 'Team Albright': 'Grey', 'Yahoo Del Don': 'DDD',
'RC BradGraphs': 'Brad', 'Razzball Scott': 'Scott', 'Team McLeod': 'McLeod',
'Razzball Radio': 'Capozz', 'Razzball Sky Sperling': 'Sky',
'Razzball Jay(Wrong)': 'Jay', 'Club Singman': 'MEEE', 'Tehol the Pathetic': 'Tehol',
'Mastersball Carey': 'Carey'}

This means that the value ‘Rudy’ can be referenced by the key ‘Razzball Rudy’. The benefit is if ‘Razzball Rudy’ were to change to ‘ESPN Rudy’ for example, the value ‘Rudy’ could still be referenced by its new key ‘ESPN Rudy’. More generally, even if the key changes, the program can still work with a constant underlying value. Make sense? I hope so, at least a little bit.

Continuing with the earlier example, here’s how we add ‘Yahoo Behrens’ as a key to the dictionary teamDict, and remove ‘Mastersball Carey’.

# adds Behrens to dict with value = 'Carey'
for n in permTeamSet:
    if n not in teamDict:
        teamDict[n] = teamDict[y]

A quick explanation of what the code does. It says: for the team names in permTeamSet, if any of the names are not keys in teamDict (which at this point is Mastersball Carey), make a new entry in the dictionary with key = ‘Yahoo Behrens’ and assign its value to be what ‘Mastersball Carey’s value was . If you look at the code earlier in the article, you’ll see y = ‘Mastersball Carey’. Right now teamDict looks like this:

print teamDict

>>>  {'Razzball Rudy': 'Rudy', 'Yahoo Del Don': 'DDD', 'Club Singman': 'MEEE', 'Razzball Jay(Wrong)': 'Jay', 'RC BradGraphs': 'Brad', 'Razzball Scott': 'Scott', 'Razzball Radio': 'Capozz', 'Razzball Sky Sperling': 'Sky', 'Tehol the Pathetic': 'Tehol', 'Yahoo Behrens': 'Carey', 'Mastersball Carey': 'Carey', 'Team McLeod': 'McLeod', 'Team Albright': 'Grey'}

len(teamDict)

>>> 13

What’s important to notice is there are now 13 teams in the dictionary because key ‘Yahoo Behrens’ has been added with the value ‘Carey’, which is the same value ‘Mastersball Carey’ had. All that’s left to do is delete the key ‘Mastersball Carey’ and we’re set.

# deletes Mastersball Carey from teamDict

if y in teamDict:

    del teamDict[y]

Viola. Printing teamDict now looks like this:

>>> {'Razzball Rudy': 'Rudy', 'Yahoo Del Don': 'DDD', 'Club Singman': 'MEEE', 'Razzball Jay(Wrong)': 'Jay', 'RC BradGraphs': 'Brad', 'Razzball Scott': 'Scott', 'Razzball Radio': 'Capozz', 'Razzball Sky Sperling': 'Sky', 'Tehol the Pathetic': 'Tehol', 'Yahoo Behrens': 'Carey', 'Team McLeod': 'McLeod', 'Team Albright': 'Grey'}

len(teamDict)

>>> 12

There ya have it. Mastersball Carey has become Yahoo Behrens, but as far as the rest of the program knows, it’s still just ‘Carey’.

Conclusion

Okay, I’ll admit, I’m likely not going to go into the nitty-gritty of the coding details in future articles like I did in this one. As much as I tried to make it as clear as possible, I’m sure anyone who doesn’t know code got lost at some point, and anyone who does know, got bored at some point. Striking a middle ground seems impossible at this beginning stage.

I hope you at least enjoyed learning something about lists, sets, and dictionaries or are motivated to learn more. Again, I recommend Learn Python the Hard Way for anyone looking to do so.

A couple more closing notes. First, this code isn’t fully robust. It does handle a name change well, but what about if multiple teams change their names? What if one team changes it’s name multiple times per day? As good as the bot might one day be, it would be worthless if someone could break it or at least confuse it by doing something as simple as changing a team name.

Also, while this code works, I don’t know if it works efficiently. Maybe turning a list into a set, comparing the set to another set, and then utilizing a couple of for loops to replace the key value in a dictionary is a horribly inefficient way to accomplish this task.

My mantra is “Even ugly code is beautiful if it works.” Right now, what I have works and I’ll worry about optimization for another day.

 

 
  1. IV says:
    (link)

    Interesting article series. I will be looking forward to these throughout the year. You have inspired me to build something similar for my league and get back into programming.

    What was your reason for using Python?

    • paul

      paul says:
      (link)

      @IV: Glad to hear it. I did not weigh the pros and cons of a bunch of languages and then settle on Python. I chose it because it’s currently the most widely used language, it’s relatively easy to learn, and in talking to people I was told it was capable of doing a task like this.

  2. PurpleStickyNote says:
    (link)

    The bot scrapes the data daily, but what exactly happens with that data?

    Am I missing something obvious here? I won’t hide the fact that I didn’t read through all the specific code pieces, I was a little lost.

    • paul

      paul says:
      (link)

      @PurpleStickyNote: Not your fault, this was a trial run for me in writing an article centered around coding concepts and explaining what blocks of code do… which is not easy when you’re not a master in this stuff and just figured it out the day before. I have ideas of how to do this better in the future (the next code-based article will likely be a data-scraping primer) and overall I won’t get into the level of detail where you can replicate what I’ve done, but hopefully you come away with a sense of how it was done.

      I’d explain this article in English like this: The goal is to have the program handle a team name change seamlessly. The way it’s done is to compare two lists: The first is a fresh list of the team names scraped from the internet that day and the second is the list of team names from yesterday (what I called the perm list). If you compare them and there’s no differences, great. If there’s a difference, it takes the new name, adds it to a Dictionary of the team names and deletes the old one.

      The reason we use the Dictionary is because dictionaries map a key (or identifier) to a value. What we do is change the key in the dictionary to the new team name and assign it the same value as the old team name. The program will work by referencing the value of a team, so even though the key changed, the value is the same and will allow it to work the same as it did before.

      • jake says:
        (link)

        @paul: look forward to the next edition. You ready my mind on explaining what scraping was.

      • PurpleStickyNote says:
        (link)

        @paul: I appreciate the in-depth response. At a high level, if we ignore each piece of code and assume this works exactly as you intend (which I’m sure it does), what does this bot provide you with? It scrapes the data of league names, etc, and then that goes where? What do you do with the data, how does this help you, and in turn help others who are keeping up with it?

  3. Mike says:
    (link)

    Are you trying to make a bot for just ESPN? Yahoo allows something that is a little more flexible in that, if the UI changes, the ability to make moves programmatically does not change.

    • paul

      paul says:
      (link)

      @Mike: I actually play Yahoo more and would definitely like to make it Yahoo compatible, but since the fine Razzball folks are letting me publish these articles here, I feel compelled to start building it for ESPN since the whole RCL setup is based there.

      If it’s not too involved to explain, what makes Yahoo more programatically compatible?

      • Mike says:
        (link)

        @paul: You don’t have to scrape data. You query Yahoo’s dbs directly. There are exceptions, like if you want to capture the waiver wire stats across all leagues: things like that lend themselves to screen scrapes.

        • paul

          paul says:
          (link)

          @Mike: Interesting, I’ll look into that. Thanks.

  4. Ryan says:
    (link)

    The dream would be to be able to scrape info from the Live Draft window in ESPN. Could really make some beautiful spreadsheets with it.

    • Tball Hero says:
      (link)

      @Ryan: I believe that the live draft window is done with flash. Screen scraping is done with HTML. Assuming I’m right, it would be hard/impossible to do the live draft but scraping the results from the standard interface would be straightforward.

  5. weas says:
    (link)

    I almost think you should avoid code specifics for this series entirely. Post your (hopefully commented) code to github or whatever, but to appeal to the widest audience, I would focus on the thought process behind the design of your algorithms and models. Anyone who knows how to code is not going to be interested in a post talking about sets vs lists, and anyone who doesn’t know how to code isn’t going to be using razzball.com to learn.

    I am far more interested in hearing conceptually (high level pesudocode) how you are thinking about solving some of the specific problems discussed in your first post.

    Personally I would use mocked data for input and testing the real meat of what we’re trying to accomplish here, rather than start by scraping data from ESPN. Abstract away the ESPN-specific pieces and focus on what your internal team/player model should look like. Also, ESPN/Yahoo leagues have Team IDs (look at the URL which takes you to each team’s page), so you shouldn’t need to worry about name changes.

    • Tball Hero says:
      (link)

      @weas: What weas said.

    • paul

      paul says:
      (link)

      @weas: I agree as well.

  6. Jim Wiser says:
    (link)

    All this is nice, but too bad your War Room won’t work with my MSN browser. And it’s too much trouble to change browsers!

    • Mantis Toboggan MD says:
      (link)

      @Jim Wiser: it’s not that hard, you could just do it for war room alone if you wanted.

  7. Jason Morgan says:
    (link)

    What exactly is the ultimate purpose of the bot? What are you attempting to do with the scraped data?

  8. GhostTownSteve says:
    (link)

    Love the project Paul. You know what it puts me in mind of? How the American man used to tinker with shit. Used to work on cars. Take apart radios. You’re taking on an engineering project. I’m not sure how you’ll end up with this, but love the spirit. I’ll definitely check in and help when I can. I don’t write code but I’ve spent a lot of time as a guy who writes specs and helps guide products from a feature/function standpoint and I know a shit load about FBB.

    • paul

      paul says:
      (link)

      @GhostTownSteve: GhostTownSteve, Didn’t watch that show, but I like the analogy. Also, I appreciate your support, I’m sure I’ll hit some roadblocks along the way and it very well may be someone in this community that helps me get through it.

      I consider myself lucky to be able to write these articles here and not, say on a personal blog, where each article gets 0 comments. Would be a lonelier experience.

  9. BrotherofVin says:
    (link)

    The great thing about Yahoo is that they provide an interface into their data – no scraping required.

    • paul

      paul says:
      (link)

      @BrotherofVin: I’ll check that out.

  10. Kevin says:
    (link)

    +1 to Creating a GitHub Repository.

    Explain and Link to Python (+ version) to download.
    So then with each post the user can just pull down your latest repository and then run whatever program your article’s referencing. It can then be modified locally and readers may make/pull(push) easily.

    I think it would capture more of an audience and you can get all the code clutter out of the article and you can just talk about your process etc.

    • paul

      paul says:
      (link)

      @Kevin: +1

  11. DrEasy says:
    (link)

    Love this project! I think I missed the part where you actually scrape the web page. How did you do that?

    I agree with others re: putting this stuff on GitHub. Even better: if you post some sort of roadmap there, along maybe with some function headers and unit tests, some of us might help out with some pull requests to get this thing moving faster.

    • paul

      paul says:
      (link)

      @DrEasy: Thank you! The web scraping hasn’t occurred yet, so stay tuned in the next few weeks.

Comments are closed.