Advanced Examples

College Football Ranking

In this example we are going to use a LambdaArena and the CFPScrapy library to build a rating system for college football and see how it performs.

To start with we need historical data on games to seed our ratings with. Luckily there is a nice library/API for that:

import CFBScrapy as cfb
from elote import LambdaArena


# pull API data
train_df = cfb.get_game_info(year=2000)
for year in range(1, 18):
    train_df.append(cfb.get_game_info(year=2000 + year))
test_df = cfb.get_game_info(year=2018).append(cfb.get_game_info(year=2019))

# sort the dates and drop unneeded cols
train_df = train_df.reindex(columns=['start_date', 'home_team', 'away_team', 'home_points', 'away_points'])
test_df = test_df.reindex(columns=['start_date', 'home_team', 'away_team', 'home_points', 'away_points'])
train_df = train_df.sort_values(by='start_date')
test_df = test_df.sort_values(by='start_date')


# then form matchup objects (winner first). First sort the data so the matchups happen in true date order
train_matchups = list()
for idx, row in train_df.iterrows():
    train_matchups.append((
        row.home_team,
        row.away_team,
        {"home_points": row.home_points, "away_points": row.away_points}
    ))

test_matchups = list()
for idx, row in test_df.iterrows():
    test_matchups.append((
        row.home_team,
        row.away_team,
        {"home_points": row.home_points, "away_points": row.away_points}
    ))

Next we need to make a lamba to execute the matchups with. Since we have the scores available in the attributes of our matchup dataset, we can simply check the score to see if the first competitor won or lost:

   # we already know the winner, so the lambda here is trivial
   def func(a, b, attributes=None):
       if attributes.get('home_points', 0.0) > attributes.get('away_points', 0.0):
           return True
       else:
           return False

To start with we will use an Elo competitor with a ``_k_factor`` of 400. We will train the ratings with a tournament
on the first couple of decades of data:

# we use the default EloCompetitor, but adjust the k_factor to 400 before running the tournament
arena = LambdaArena(func)
arena.set_competitor_class_var('_k_factor', 400)
arena.tournament(train_matchups)

Once we’ve developed some ratings, let’s take a look at the training set and how the ratings performed, and use that to select some potential thresholds:

# do a threshold search and clear the history for validation
_, thresholds = arena.history.random_search(trials=10_000)
tp, fp, tn, fn, do_nothing = arena.history.confusion_matrix(*thresholds)
print('\n\nTrain Set: thresholds=%s' % (str(thresholds), ))
print('wins: %s' % (tp + tn, ))
print('losses: %s' % (fp + fn, ))
print('do_nothing: %s' % (do_nothing, ))
print('win pct: %s%%' % (100 * ((tp + tn)/(tp + tn + fp + fn + do_nothing))))
arena.clear_history()

This will return:

Train Set: thresholds=[0.6350196774347375, 0.9364243175248251]
wins: 267
losses: 236
do_nothing: 171
win pct: 39.61424332344214%

And while we are here let’s also print out what the rankings would have been to start the 2018 season:

   # then we print out the top 25 as of the end of our training dataset
   print('\n\nTop 25 as of start of validation:')
   rankings = sorted(arena.leaderboard(), reverse=True, key=lambda x: x.get('rating'))[:25]
   for idx, item in enumerate(rankings):
       print('\t%d) %s' % (idx + 1, item.get('competitor')))

Which will print:

.. code-block::

   Top 25 as of start of validation:
   1) Miami
   2) Oklahoma
   3) Florida State
   4) Oregon State
   5) Texas
   6) Georgia Tech
   7) Washington
   8) Virginia Tech
   9) Kansas State
   10) Notre Dame
   11) Cincinnati
   12) TCU
   13) Michigan
   14) Arkansas
   15) Toledo
   16) Air Force
   17) Tennessee
   18) Auburn
   19) Florida
   20) Boise State
   21) Louisville
   22) Middle Tennessee
   23) North Carolina
   24) Pittsburgh
   25) Oregon

Now let’s take a look at some hold out validation by using these ratings to take a look at the 2018 and 2019 seasons. The ratings will of course still update as the games are evaluated:

# now validation
print('\n\nStarting Validation Step...')
arena.tournament(test_matchups)
report = arena.history.report_results()

We can then look at the results from just this set (notice we ran clear_history() up above to wipe out the train set results from our history tracker:

tp, fp, tn, fn, do_nothing = arena.history.confusion_matrix(0.4, 0.6)
print('\n\nTest Set: using 0.4/0.6 thresholds')
print('wins: %s' % (tp + tn, ))
print('losses: %s' % (fp + fn, ))
print('do_nothing: %s' % (do_nothing, ))
print('win pct: %s%%' % (100 * ((tp + tn)/(tp + tn + fp + fn + do_nothing))))

tp, fp, tn, fn, do_nothing = arena.history.confusion_matrix(*thresholds)
print('\n\nTest Set: using learned thresholds: %s' % (str(thresholds), ))
print('wins: %s' % (tp + tn, ))
print('losses: %s' % (fp + fn, ))
print('do_nothing: %s' % (do_nothing, ))
print('win pct: %s%%' % (100 * ((tp + tn)/(tp + tn + fp + fn + do_nothing))))

Which will print out:

Test Set: using 0.4/0.6 thresholds
wins: 1045
losses: 456
do_nothing: 193
win pct: 61.68831168831169%

Test Set: using learned thresholds: [0.6350196774347375, 0.9364243175248251]
wins: 804
losses: 483
do_nothing: 407
win pct: 47.4616292798111%

Not awesome. This is probably related to k_factor which tunes how quickly ratings will respond to new matchups. Let’s try doubling it to 800 and rerunning. Now you will see the final output:

Test Set: using 0.4/0.6 thresholds
wins: 1095
losses: 503
do_nothing: 96
win pct: 64.63990554899645%


Test Set: using learned thresholds: [0.5277889558418678, 0.6981558136040092]
wins: 1093
losses: 526
do_nothing: 75
win pct: 64.52184179456907%

Before we get too excited about this, let’s take a look at the post-game win probabilities provided by the same API we are getting data from:

Test Set: using probabilities from dataset as baseline
wins: 1481
losses: 117
do_nothing: 96
win pct: 87.42621015348288%

So we’re not exactly going to Vegas.