The Data Analytics of March Madness

Predictive analytics can be used for a wide variety of applications, including matching the right offer to the right customer at the right time using data from customer transaction history, customer needs and preferences, as well as customer lifecycle status.

Last week, many data scientists and other college basketball enthusiasts found themselves making extensive use of statistics and analytics tools for altogether different purposes: to gain an edge on fellow March Madness bettors in their efforts to make correct picks in their NCAA tournament pools.

Even with the help of analytics, making the right picks isn’t easy: there are an astounding 9.2 quintillion possibilities for the possible winners in a 64-team bracket in the NCAA tournament. By expanding the field to 68 teams in 2011, the odds of picking all of the bracket winners has increased to 147.57 quintillion to 1.

One approach to picking the winners involves the use of analytics tools to help determine the statistical likelihood of certain tournament seeds beating other types of seeds (e.g. #13 versus #4). GigaOM and BusinessWeek blogger Derrick Harris notes that he makes his picks using a tool called BracketOdds created by University of Illinois computer science professor Sheldon Jacobson.

BracketOdds lets you know the probability of any combination of seeds making it to a given round in the tournament. For the Final Four, the most likely seed combination this year is 1,1,2,3. The odds against this combination occurring is only 16.08 to 1. As Harris notes in his blog, the odds of each of the top seeded teams making the Final Four is 48.7 to 1, so the chances of a #2 seed and a #3 seed making the mix is three times more likely to occur.

Of course, sentimentality and bias often factor into the NCAA picks made by data scientists and other gamers. One analyst I spoke to this week says he includes a range of variables with his team selections, such as a comparison of team records, the strength of each team’s regular season schedule versus its opponent, game location and the proximity to each school’s campus/fan base, the statistical likelihood of “sleeper” teams (e.g. overlooked 11th seeds) to advance in specific rounds of the tournament, etc.

But when I pressed him on which team he picked to win, he confessed his partiality for the University of Missouri. A #2 seed in the West region, Mizzou is known for its guard tandem. And it’s his alma mater.

But in the end, Norfolk State became just the fifth #15 seed in tournament history to knock out a #2 seed (Mizzou), shortly followed by Lehigh’s upset win against #2 Duke. Statistically unlikely to occur? Absolutely.

The odds of a #15 seed beating a #2 seed are 25 to 1. Finding this type of information isn’t always easy. Decision makers in business sometimes encounter similar challenges. Indeed, it’s critical for business leaders to be able to get at the type of information they need when they need it.

So how did Norfolk State pull off its historic upset? In part, by scoring a highly efficient 1.34 points per possession.

Cinderella teams have pulled off upsets in previous tournaments but they’ve rarely advanced into the final rounds. In fact, in the history of the tournament, only a small number of teams seeded lower than #8 has actually made it to the final rounds. Just two #14 seeds have reached the Sweet Sixteen (Cleveland State in 1986 and Chattanooga in 1997).

Meanwhile, a #12 seed has made it to the Elite Eight just once (Missouri in 2002) while an #11 seed has reached the Final Four just three times (LSU in 1986, George Mason in 2006 and Virginia Commonwealth in 2011). And in perhaps the greatest upset in championship game history, the #8 seeded Villanova Wildcats stunned #1 seed Georgetown 66-64 on April 1, 1985.

Improbable, but it can happen