Bracketology

Our office, like many offices, is having a pool for the NCAA tournament. Well, technically, it is all for fun, but there’s a “donation” if you want to be eligible for a “prize.” While I entered the traditional bracket chosen by random inferences such as “I really want Minnesota to win at least a game,” and “Of course I believe Wisconsin will beat Kentucky,” and other Big Ten slanted favoritisms (Ohio State excepted), I also did something a little more scientifically random: two randomized bracket simulators using two different methods to choose a winner.
In the first method, I looked only at the seeds and no other outside information. If an x seed played a y seed, the x seed has a probability y/(x + y) of winning the matchup. Thus, for the 1 seed vs. 16 seed games, the 1 seed should win 16/17 times. Running five* different simulations of this algorithm predict the champion to be Kansas (twice), Wisconsin (!), Duke, and Vanderbilt (whoops).
In the second method, I used this article from ESPN to generate the probabilities of winning in any given matchup based on the historical matchups between seeds. (If a matchup occurred in the championship or final four that had never occurred in the championship or final four before, I used the regional bracket matchup for those seeds. If a matchup occurred which had never occurred before, I defaulted to the lower seed always winning.) Thus, in this simulation, a 16 seed never beats a 1 seed (because it’s never happened before). Five different simulations of this algorithm chose Kentucky (twice), Kansas, Duke, and Ohio State as champions.
After day one, the brackets generated using the first method have selected 11 games on average correctly. The brackets generated using method two have on average selected 9.6 games correctly. In addition, method one generated the bracket doing the best overall (and tied for first in the whole office pool with my co-worker who follows basketball closely) with 13 correct games. This bracket picked Kansas (my actual pick, aligning with Barack Obama), so it may be the one to beat.
For complete scientific openness, I’m posting the code for each method. Of course, they’re written in perl, so if you can read and understand them, it’ll be a miracle.

* Yes, if I was being truly scientific, I would have run it more times. But it’s really tedious to enter brackets on the CBS website we’re using for work.