Desert of My Real Life











{August 2, 2009}   Recognizing Patterns for NetFlix

My area of research when I was in computer science was artificial intelligence.  AI is a broad field with many subfields, each of which has many applications.  Within AI, I was particularly interested in pattern recognition via machine learning techniques. When I left computer science, I turned my research attention to the topic of this blog and began to focus more and more on the impact of technology on society and media technology issues.  So I was quite interested this morning when my favorite National Public Radio show, On the Media, broadcast a story that shows the connection between these two research interests.

Pattern recognition sounds like an esoteric subfield of AI.  But in today’s computer-focused society, there are many useful applications of pattern recognition.  For example, I worked on two problems in microbiology while I was a graduate student.  My master’s work involved looking for patterns in strands of DNA of an organism called Onchocherca volvulus which causes river blindness.  We were trying to determine whether we could determine the evolution history and path of the organism to help with understanding the epidemiology of the disease.  For my PhD, I worked on the famous “protein folding problem“, trying to predict the 3-dimensional structure of a strand of protein by looking at just the sequence of amino acids that make up the protein.  The theory is that if we can predict the 3-D structure, we can predict the function of the protein as well and the implications of that are far-reaching.  As I said, there are many practical applications of pattern recognition by computers.

On today’s edition of On the Media, there was a story that reminded me of the fact that pattern recogniton by computers is everywhere in our society.  The story was about a contest by NetFlix, the DVD rental site.  NetFlix allows subscribers to rate movies via a star system, where one star means “hated it” and five stars means “loved it”.  Based on the ratings that a particular subscriber has given a set of movies, NetFlix attempts to recommend other movies that the subscriber will enjoy.  NetFlix’s business model depends on these recommendations since a larger percentage of their movie rentals come from subscribers listening to these recommendations.  Without the recommendations, subscribers would likely run out of movies that they know they want to see and then would eventually give up their subscriptions.  But predicting what movies a person will like is a very difficult problem.  

NetFlix does a pretty good job with their movie recommendation system, Cinematch, but if they can make better predictions, they’re likely to hang on to more subscribers.  So they created a contest, offering a million dollars to anyone who can develop an algorithm that does 10% better in its predictions than Cinematch.  Apparently, a number of groups immediately were able to develop algorithms that were 5% more accurate than Cinematch.  Even getting to 8% more accuracy didn’t take that long.  But a number of intriguing issues made reaching the 10% mark difficult.  One of the most interesting is known as the “Napoleon Dynamite problem.”  Napoleon Dynamite is a quirky, independent movie that came out in 2004.  It seems that it is quite difficult to accurately predict whether a particular subscriber will like or dislike this movie.  In fact, two people whose likes and dislikes are quite similar can disagree drastically about Napoleon Dynamite.  So getting to the 10% mark will probably require a solution to the “Napoleon Dynamite problem.”

The contest closed a couple of days ago, although no winner has yet been announced.  NetFlix says that they received 44,014 entries from 5169 teams in 186 countries.  One of the requirements of the contest is that the winners must disclose their techniques to the world.  Although getting more accurate movie recommendations is not a  life or death problem, the solution to it is likely to provide insight into how to accomplish other pattern recognition tasks.  And that’s good news for all of us.



et cetera