LOGIN

I thought a fair amount about what the topic should be of my second article, and while I would have been happy to continue waxing theoretically about what a fantasy baseball bot might look like, I figure people probably want to see some actual code or at the very least pseudo-code. You’ll settle for some pseudo-code, right? Great.

Order of business No. 1 in this venture is data-scraping. What will we want to scrape? Let’s start with a league homepage:

Screen Shot 2015-02-28 at 1.50.19 PM

 

For ESPN I’ll admit, surprisingly minimalist. I guess my AdBlock is working after all. We got the league name (potentially useful), some crappy articles (less useful), congrats to Rudy on finishing first, and ah, the Standings. Let’s expand those:

Screen Shot 2015-02-28 at 2.11.06 PM

What a beautiful page! Basically all the information on this page should be scraped daily, starting with a list of of the team names.

Let me be clear here. By list I don’t mean something your wife gives you before heading to the grocery store; in Python lists are a specific type of container for data storage with unique syntax and properties. As it stands they are a relatively simple container.

(I don’t plan on going into too much Python syntactical minutiae, but for this first example, I’ll try it out.)

Lists

First off, to define a list, you use square brackets [ ]. For example:

teamNamesList = [] # creates an empty list named teamNamesList

After scraping from the standings page, we’ll hopefully have something that looks like this:

teamNamesList = ['Razzball Rudy', 'Team Albright', 'Yahoo Del Don', 'RC BradGraphs',
'Razzball Scott', 'Team McLeod', 'Razzball Radio', 'Razzball Sky Sperling',
'Razzball Jay(Wrong)', 'Club Singman', 'Tehol the Pathetic', 'Mastersball Carey']

A couples things you can do with this list. If I wanted to select a specific team, I’d use it’s index number, which for these 12 teams is a sequence from 0 to 11. For example:

print teamNamesList[0]

would spit out:

'Razzball Rudy'

and,

print teamNamesList[11]

would result in 'Mastersball Carey' being printed to the terminal, as it’s called.

Another example of something I could do with the teamNames list is create a variable called numTeams and set it equal to the length of teamNamesList. In Python that would look like this:

numTeams = len(teamNamesList)

print numTeams

12

While this may seem like a nice feature, practically speaking, it is probably best to have a human being input the number of teams directly into the program as a parameter. Until leagues start kicking owners out in the middle of the season (not a terrible idea), there’s no need to recalculate the number of teams on a daily basis.

If you’ve followed me this far, let me bring up a more interesting scenario: what should happen when a team changes its name?

One solution is to ban all teams from ever changing their names, but that’s neither fun nor realistic. Another option is to simply have your teamName list change as well, but that could make any historical variable confusing to interpret, and also mess up the entire program depending on how it’s coded. Basically, we need to find a good way to deal with this scenario.

Here’s where I will introduce another kind of data container: the set.

Sets

The main features of a list are that it is ordered and items in a list can only be referenced by numerical indexing (Remember teamNamesList[0] = ‘Razzball Rudy’)

Sets  are different in that they are unordered and they don’t have to be reference by a number. The benefits of these two features will be made apparent when dealing with the problem of a team name change.

First, we convert the teamNames List into a teamNames Set:

teamNamesList = [‘Razzball Rudy’, ‘Team Albright’, ‘Yahoo Del Don’, ‘RC BradGraphs’,
‘Razzball Scott’, ‘Team McLeod’, ‘Razzball Radio’, ‘Razzball Sky Sperling’,
‘Razzball Jay(Wrong)’, ‘Club Singman’, ‘Tehol the Pathetic’, ‘Mastersball Carey’]

teamNamesSet = set(teamNamesList)

What converting to a set allows us to do is compare the elements of the teamNamesSet to another set, regardless of the order. This is where the unordered aspect of sets comes in handy.

Let’s say we have the set of the team names from yesterday called permTeamNamesSet. permTeamNamesSet will look something like this:

print permTeamNamesSet

>>> set([‘Razzball Radio’, ‘Yahoo Del Don’, ‘Razzball Rudy’, ‘Mastersball Carey’, ‘Razzball Sky Sperling’, ‘Tehol the Pathetic’, ‘Club Singman’, ‘Razzball Jay(Wrong)’, ‘Team McLeod’, ‘Razzball Scott’, ‘RC BradGraphs’, ‘Team Albright’])

[From now on what I put after ‘>>>’ will be what Python prints to the terminal]

This you’ll notice, is exactly the same as the teamNames Set. So, if we were to take the difference of the two sets, we’d be returned with an empty set.

x = teamNamesSet.difference(permTeamNamesSet)

print x

>>> set([])

Now however, what if Mr. Carey the owner of Mastersball Carey, decides he’s unhappy in last place and hands over control of his team to Yahoo’s Andy Behrens? The first thing Mr. Behrens would probably do is change his team name to ‘Yahoo Behrens’. What will our set difference look like then?

teamNamesSet = {'Razzball Rudy', 'Team Albright', 'Yahoo Del Don', 'RC BradGraphs',
'Razzball Scott', 'Team McLeod', 'Razzball Radio', 'Razzball Sky Sperling',
'Razzball Jay(Wrong)', 'Club Singman', 'Tehol the Pathetic', 'Yahoo Behrens'}

x = teamNamesSet.difference(permTeamSet)

print x

>>> set(['Yahoo Behrens'])

Pretty simple, this time the .difference() function identified that the set teamNamesSet has a new element, ‘Yahoo Behrens’, that is not in the set named permTeamSet.

Taking the difference the other way around, leads Python to identify ‘Mastersball Carey’ as the team no longer in the set of team names.

y = permTeamSet.difference(teamNamesSet)

print y

>>> set(['Mastersball Carey'])

Wonderful. So what we want to do is add ‘Yahoo Behrens’ and also remove ‘Mastersball Carey’ to the permTeamSet. This can be accomplished with the following code:

# adds 'Yahoo Behrens'
permTeamSet.update(teamSet)

print permTeamSet

>>> set(['Razzball Scott', 'Razzball Rudy', 'Yahoo Del Don', 'Club Singman', 'Razzball Jay(Wrong)', 'RC BradGraphs', 'Mastersball Carey', 'Razzball Radio', 'Razzball Sky Sperling', 'Tehol the Pathetic', 'Yahoo Behrens', 'Team McLeod', 'Team Albright'])

# deletes Mastersball Carey
permTeamSet.difference_update(y)

print permTeamSet

>>> set(['Razzball Scott', 'Razzball Rudy', 'Yahoo Del Don', 'Club Singman', 'Razzball Jay(Wrong)', 'RC BradGraphs', 'Razzball Radio', 'Razzball Sky Sperling', 'Tehol the Pathetic', 'Yahoo Behrens', 'Team McLeod', 'Team Albright'])

Sweet, Mastersball Carey is no longer in there, and Yahoo Behrens is so we’ve updated our permTeamSet to reflect the team name change. While that’s great, what still want to do is have something linked to the team names that can remain constant, even when the team name itself changes. This is where the Dictionary comes into play.

Dictionaries

Basically, Dictionaries map one thing to another, or in more official terms, a key to a value. Here’s an example:

dict1 = {'a': 'duck', '2': 'Razzball', 'chimpanzee': 'How's your mother?'}

This dictionary or dict maps a to duck, 2 to Razzball, and chimpanzee to How’s your mother?

This may seem pointless, this mapping of two objects, but it’s exactly what we need to not have the program affected by a name change.

At the moment, a dictionary of the team names in this league would look like this:

teamDict = {'Razzball Rudy': 'Rudy', 'Team Albright': 'Grey', 'Yahoo Del Don': 'DDD',
'RC BradGraphs': 'Brad', 'Razzball Scott': 'Scott', 'Team McLeod': 'McLeod',
'Razzball Radio': 'Capozz', 'Razzball Sky Sperling': 'Sky',
'Razzball Jay(Wrong)': 'Jay', 'Club Singman': 'MEEE', 'Tehol the Pathetic': 'Tehol',
'Mastersball Carey': 'Carey'}

This means that the value ‘Rudy’ can be referenced by the key ‘Razzball Rudy’. The benefit is if ‘Razzball Rudy’ were to change to ‘ESPN Rudy’ for example, the value ‘Rudy’ could still be referenced by its new key ‘ESPN Rudy’. More generally, even if the key changes, the program can still work with a constant underlying value. Make sense? I hope so, at least a little bit.

Continuing with the earlier example, here’s how we add ‘Yahoo Behrens’ as a key to the dictionary teamDict, and remove ‘Mastersball Carey’.

# adds Behrens to dict with value = 'Carey'
for n in permTeamSet:
    if n not in teamDict:
        teamDict[n] = teamDict[y]

A quick explanation of what the code does. It says: for the team names in permTeamSet, if any of the names are not keys in teamDict (which at this point is Mastersball Carey), make a new entry in the dictionary with key = ‘Yahoo Behrens’ and assign its value to be what ‘Mastersball Carey’s value was . If you look at the code earlier in the article, you’ll see y = ‘Mastersball Carey’. Right now teamDict looks like this:

print teamDict

>>>  {'Razzball Rudy': 'Rudy', 'Yahoo Del Don': 'DDD', 'Club Singman': 'MEEE', 'Razzball Jay(Wrong)': 'Jay', 'RC BradGraphs': 'Brad', 'Razzball Scott': 'Scott', 'Razzball Radio': 'Capozz', 'Razzball Sky Sperling': 'Sky', 'Tehol the Pathetic': 'Tehol', 'Yahoo Behrens': 'Carey', 'Mastersball Carey': 'Carey', 'Team McLeod': 'McLeod', 'Team Albright': 'Grey'}

len(teamDict)

>>> 13

What’s important to notice is there are now 13 teams in the dictionary because key ‘Yahoo Behrens’ has been added with the value ‘Carey’, which is the same value ‘Mastersball Carey’ had. All that’s left to do is delete the key ‘Mastersball Carey’ and we’re set.

# deletes Mastersball Carey from teamDict

if y in teamDict:

    del teamDict[y]

Viola. Printing teamDict now looks like this:

>>> {'Razzball Rudy': 'Rudy', 'Yahoo Del Don': 'DDD', 'Club Singman': 'MEEE', 'Razzball Jay(Wrong)': 'Jay', 'RC BradGraphs': 'Brad', 'Razzball Scott': 'Scott', 'Razzball Radio': 'Capozz', 'Razzball Sky Sperling': 'Sky', 'Tehol the Pathetic': 'Tehol', 'Yahoo Behrens': 'Carey', 'Team McLeod': 'McLeod', 'Team Albright': 'Grey'}

len(teamDict)

>>> 12

There ya have it. Mastersball Carey has become Yahoo Behrens, but as far as the rest of the program knows, it’s still just ‘Carey’.

Conclusion

Okay, I’ll admit, I’m likely not going to go into the nitty-gritty of the coding details in future articles like I did in this one. As much as I tried to make it as clear as possible, I’m sure anyone who doesn’t know code got lost at some point, and anyone who does know, got bored at some point. Striking a middle ground seems impossible at this beginning stage.

I hope you at least enjoyed learning something about lists, sets, and dictionaries or are motivated to learn more. Again, I recommend Learn Python the Hard Way for anyone looking to do so.

A couple more closing notes. First, this code isn’t fully robust. It does handle a name change well, but what about if multiple teams change their names? What if one team changes it’s name multiple times per day? As good as the bot might one day be, it would be worthless if someone could break it or at least confuse it by doing something as simple as changing a team name.

Also, while this code works, I don’t know if it works efficiently. Maybe turning a list into a set, comparing the set to another set, and then utilizing a couple of for loops to replace the key value in a dictionary is a horribly inefficient way to accomplish this task.

My mantra is “Even ugly code is beautiful if it works.” Right now, what I have works and I’ll worry about optimization for another day.