Data Scientists in Tune with Your Playlist

Reading Time: 4 minutes

Do you know that the music business is also a big data business?

Well it is, according to this article in Forbes magazine. It seems record companies are no different than any other company that relies on big data to gain insight into the business to make the right decisions.

Let me interrupt this post for a minute with a little aside. For the record, I am a terrible procrastinator. I live by the words of my late grandfather: Don’t do today what you can put off until tomorrow. Seriously.

I share this information for a reason. I begin writing this post Wednesday morning at about 9:45 a.m. I have probably two, maybe three sentences written when Mr. Procrastinator taps me on the shoulder and urges me to do something – anything – that would take me away from my writing.

I minimize my Word doc and head straight for the “Interweb.” Feeling a little guilty, I decide to make good use of my “away from writing time” (read: putting off my work).

So I head over to Spotfire’s Twitter page to check out what my colleagues are up to. And lo and behold, this is what I find: Data Scientists Figured Out What Songs You’ll Like (In Just 24 Hours)

One of my colleagues has tweeted the very article I’m writing about (or was writing about) at just about the same time. What are the odds? (Maybe we can use some advanced analytics to figure it out.)

I take this as sign on two fronts: First, I’m on the right track as far as the subject is concerned; and second I should get back to writing about it.

Here goes.

It seems when it comes to determining what bands to sign, and how to promote them, record companies rely on a ton of marketing data.

And that’s what leads EMI to produce its One Million Interview Dataset – the richest and largest music dataset ever that contains interests, attitudes, behaviors, familiarity, and appreciation of music as expressed by music fans worldwide.

Each interview – currently there are about 800,000 –  takes around 20 minutes. The dataset connects data about people – who they are, where they live, how they engage with music in their daily lives – to their opinions about EMI’s artists, according to the Forbes article.

EMI has been conducting these online interviews of consumers since early in 2009.

“These interviews have helped to drive EMI’s deep understanding of the music industry and has deeply touched how we’ve helped our artists around the world and at all levels over that time,” according to the company.

By determining what kinds of music people in various markets want to buy, EMI execs can better understand how to help their artists be successful in these markets. As Knapp points out, gathering the data may not be easy but it can offer up real insight into how music affects people. The problem is sorting through all that data – a daunting task at best, he says.

But recently that’s just what some data scientists did. In fact, 138 competitive data teams got together for a 24-hour “Music Data Science Hackathon,” sponsored, in part, by EMI. The hackathon challenge: “Can you predict if a listener will love a new song?”

What the data scientists want to determine is the rating that a person would give to a particular song. The hackathon focuses on “understanding what it is about people and artists that predicts how much people are going to like a particular track,” according to officials of Kaggle, a platform for predictive modeling competitions.

The goal of the hackathon – to develop an algorithm that can predict a listener’s level of appreciation for songs and artists, based on the listener’s demographics, word associations, and the past interviews stored in the EMI Million Interview Dataset.

Interestingly, the researchers soon learn that traditional marketing approaches don’t work. And things like age and socioeconomic data are typically not very good predictors of songs, Knapp explains. Rather, people’s general interests and attitudes are much better drivers of predictions, he notes.

However, the data scientists do learn that contrary to popular opinion, older, retired people are less discriminating and more open in their musical tastes than younger people. Who knew?

The winner of the Music Data Science Hackathon is Shanda Innovations, a tech incubator of the Shanda Corp. of China. The team applies machine learning techniques to the data to develop its algorithm. However, one of the biggest obstacles to the challenge isn’t the math but rather the people.

In an interview about the winning algorithm, the Shanda team says, “We were very surprised to find that the variation of the track scores given by different people was a lot more than we expected. For instance, User ID 41072 scored 100 to track 156 whereas User ID 41286 gave merely 4 to the same track! It was very interesting to find that people were so different in music preference and we believed that was why so many different types of music existed.”

Now if only data scientists could use big data and analytics to find a cure for procrastination.