Architecture, Data and Intelligence: Netflix

Showing posts with label Netflix. Show all posts

Saturday, June 03, 2023

Netflix and Algorithms

Following my previous posts on Netflix, I have been reading a detailed analysis in Ed Finn's book, What Algorithms Want (2017).

Finn's answer to my question Does Big Data Drive Netflix Content? is no, at least not directly. Although Netflix had used data to commission new content as well as recommend existing content (Finn's example was House of Cards) it had apparently left the content itself to the producers, and then used data and algorithmic data to promote it.

After making the initial decision to invest in House of Cards, Netflix was using algorithms to micromanage distribution, not production. Finn p99

Obviously something written in 2017 doesn't say anything about what Netflix has been doing more recently, but Finn seems to have been looking at the same examples as the other pundits I referenced in my previous post.

Finn also makes some interesting points about the transition from the original Cinematch algorithm to what he calls Algorithm 2.0.

The 1.0 model gave way to a more nuanced, ambiguity-laden analytical environment, a more reflexive attempt to algorithmically comprehend Netflix as a culture machine. ... Netflix is no longer constructing a model of abstract relationships between movies based on ratings, but a model of live user behavior in their various apps Finn p90-91

The coding system relies on a large but hidden human workforce, hidden to reinforce the illusion of pure algorithmic recommendations (p96) and perfect personalization (p107). As Finn sees it, algorithm 1.0 had a lot of data but no meaning, and was not able to go from data to desire (p93). Algorithm 2.0 has vastly more data, thanks to this coding system - but even the model of user behaviour still relies on abstraction. So exactly where is the data decoded and meaning reinserted (p96)?

As Netflix executives acknowledge, so-called ghosts can emerge (p95), revealing a fundamental incompleteness (lack) in symbolic agency (p96).

Ed Finn, What Algorithms Want: Imagination in the Age of Computing (MIT Press, 2017)

Alexis C. Madrigal, How Netflix Reverse-Engineered Hollywood (Atlantic, 2 January 2014)

Previous posts: Rhyme or Reason - The Logic of Netflix (June 2017), Does Big Data Drive Netflix Content? (January 2021)

Friday, January 01, 2021

Does Big Data Drive Netflix Content?

One thing that contributes to the success of Netflix is its recommendation engine, originally based on an algorithm called CineMatch. I discussed this in my earlier post Rhyme or Reason (June 2017).

But that's not the only way Netflix uses data. According to several pundits (Bikker, Dans, Delger, FrameYourTV, Selerity), Netflix also uses big data to create content. However, it's not always clear to what extent these assertions are based on inside information rather than just intelligent speculation.

According to Enrique Dans

The latest Netflix series is not being made because a producer had a divine inspiration or a moment of lucidity, but because a data model says it will work.

Craig Delger's example looks pretty tame - analysing the intersection between existing content to position new content.

The data collected by Netflix indicated there was a strong interest for a remake of the BBC miniseries House of Cards. These viewers also enjoyed movies by Kevin Spacey, and those directed by David Fincher. Netflix determined that the overlap of these three areas would make House of Cards a successful entry into original programming.

This is the kind of thing risk-averse producers have always done, and although data analytics might enable Netflix to do this a bit more efficiently, it doesn’t seem to represent a massive technological innovation. Thomas Davenport and Jeanne Harris discuss some more advanced use of data in the second edition of their book Competing on Analytics.

Netflix ... has used analytics to predict whether a TV show will be a hit with audiences. ... It has used attribute analysis ... to predict whether customers would like a series, and has identified as many as seventy thousand attributes of movies and TV shows, some of which it drew on for the decision whether to create it.

One of the advantages of a content delivery platform is that you can track the consumption of your content. Amazon used the Kindle to monitor how many chapters people actually read, at what times of day, where and when they get bored. Games platforms (Nintendo, PlayStation, X-Box) can track how far people get with the games, where they get stuck, and where they might need some TLC or DLC. So Netflix knows where you pause or give up, which scenes you rewind to watch again. Netflix can also experiment with alternative trailers for the same content.

In theory, this kind of information can then be used not just by Netflix to decide where to invest, but also by content producers to produce more engaging content. But it's difficult to get clear evidence how much influence this actually has on content creation.

How much other (big) data does Netflix actually collect about its consumers. Craig Delger assumes they operate much like most other data-hungry companies.

Netflix user account data provides verified personal information (sex, age, location), as well as preferences (viewing history, bookmarks, Facebook likes).

However, in a 2019 interview (reported by @dadehayes), Ted Sarandos denied this.

We don’t collect your data. I don’t know how old you are when you join Netflix. I don’t know if you’re black or white. We know your credit card, but that’s just for payment and all that stuff is anonymized.

Sarandos, who is Chief Content Officer at Netflix, also downplayed the role that data (big or otherwise) played in driving content.

Picking content and working with the creative community is a very human function. The data doesn’t help you on anything in that process. It does help you size the investment. … Sometimes we’re wrong on both ends of that, even with this great data. I really think it’s 70, 80% art and 20, 30% science.

But perhaps that's what you'd expect him to say, given that Netflix has always tried to attract content producers with the promise of complete creative freedom. Amazon Studios has made similar claims. See report by Roberto Baldwin.

While there may be conflicting narratives about the difference data makes to content creation, there are some observations that seem relevant if inconclusive.

Firstly, the long tail argument. The orginal business model for Amazon and Netflix was based on having a vast catalogue, in which most of the entries are of practically no interest to anyone, because the cost of adding something to the catalogue was trivial. Even if the tail doesn't actually contribute as much revenue as the early proponents of the long tail theory suggested, it helps to mitigate uncertainty and risk - not knowing in advance which are going to be hits.

But this effect is countered by the trend towards vertical integration. Amazon and Netflix have gone from distribution to producing their own content, while Disney has moved into streaming. This encourages (but doesn't prove) the hypothesis that there may be some data synergies as well as commercial synergies.

And finally, an apparent preference for conventional non-disruptive content, as noted by Alex Shephard, which is pretty much what we would expect from a data-driven approach.

Netflix is content to replicate television as we know it—and the results are deliberately less than spectacular.

Update (June 2023)

I have been reading a detailed analysis in Ed Finn's book, What Algorithms Want (2017).

Finn's answer to my question about data-driven content is no, at least not directly. Although Netflix had used data to commission new content as well as recommend existing content (Finn's example was House of Cards) it had apparently left the content itself to the producers, and then used data and algorithmic data to promote it.

After making the initial decision to invest in House of Cards, Netflix was using algorithms to micromanage distribution, not production. Finn p99

Obviously that doesn't say anything about what Netflix has been doing more recently, but Finn seems to have been looking at the same examples as the other pundits I referenced above.

Roberto Baldwin, With House of Cards, Netflix Bets on Creative Freedom (Wired, 1 February 2013)

Yannick Bikker, How Netflix Uses Big Data to Build Mountains of Money (7 July 2020)

Enrique Dans, How Analytics Has Given Netflix The Edge Over Hollywood (Forbes, 27 May 2018), Netflix: Big Data And Playing A Long Game Is Proving A Winning Strategy (Forbes, 15 January 2020)

Thomas Davenport and Jeanne Harris, Competing on Analytics (Second edition 2017) - see extract here https://www.huffpost.com/entry/how-netflix-uses-analytics-to-thrive_b_5a297879e4b053b5525db82b

Ed Finn, What Algorithms Want: Imagination in the Age of Computing (MIT Press, 2017)

FrameYourTV, How Netflix uses Big Data to Drive Success via Inside BigData (20 January 2018)

Daniel G. Goldstein and Dominique C. Goldstein, Profiting from the Long Tail (Harvard Business Review, June 2006)

Dade Hayes, Netflix’s Ted Sarandos Weighs In On Streaming Wars, Agency Production, Big Tech Breakups, M+A Outlook (Deadline, 22 June 2019)

Alexis C. Madrigal, How Netflix Reverse-Engineered Hollywood (Atlantic, 2 January 2014)

Selerity, How Netflix used big data and analytics to generate billions (5 April 2019)

Alex Shephard, What Netflix’s Obama Deal Says About the Future of Streaming (New Republic 23 May 2018)

Related posts: Competing on Analytics (May 2010), Rhyme or Reason - the Logic of Netflix (June 2017)

Thursday, June 29, 2017

Rhyme or Reason - The Logic of Netflix

@GuyLongworth, who teaches philosophy at Warwick, is puzzled by the Netflix recommendation algorithm, linking Annie Hall with Son of Saul.

Having seen both, I can only think that this must have to do with rhyme.
— Guy Longworth (@GuyLongworth) June 29, 2017

Philosopher Guy's appeal to rhyme rather than reason seems to be based on the view that the two films have nothing else in common. But this is rather contradicted by the fact that he has actually seen both. Netflix has correctly surmised that people like Guy might possibly be interested in both films.

The first thing to understand about recommendation algorithms is that they are not solely (if at all) based on the intrinsic similarity of two products, but on what we might call relational similarity. If I tell you that people who like pizza also like ice-cream, that is primarily a statement about the "people who like". You might try to explain this statement by observing that pizza and ice-cream both have a high fat content, but then so do lots of other foods.

And when someone has just eaten a pizza, it is perhaps more likely that they will go on to eat ice-cream next, rather than eating another pizza straightaway.

Would it be virtue signalling of me to reveal that I resisted the lure of the second pizza?
— Guy Longworth (@GuyLongworth) June 22, 2017

The second thing to understand is that recommendation algorithms work by trial and error. Netflix wants to know if Guy will accept its suggestion to re-watch Annie Hall, and this feedback will add to its knowledge of Guy as well as its knowledge of relational similarity between films.

Trial and error works better if you have a diverse range of trials. If you watch a couple of films in a particular genre, and then Netflix only ever shows you suggestions within that genre, it will never discover that you might be interested in a completely different genre as well. And you will never discover the full range of Netflix offerings, which could result in your abandoning Netflix altogether.

Diversity of suggestion adds to the richness of the experimental data that are generated. How many members of the "people like Guy" category respond positively to suggestion A, and how many to suggestion B? Todd Yellin, Netflix VP of Product, told journalists in March that "we are addicted to the methodology of A/B testing".

What is genre anyway? In the past, genres (in book publishing, music, film, video games) were defined by the industry or by experts. In 2013, Netflix employed over 40 people hand-tagging TV shows and movies. But a data-driven approach allows genres to emerge organically from the patterns of consumption. Netflix (and Amazon and the rest) will be much more interested in data-defined genres than in industry-defined genres.

In her rant against the Netflix algorithm, @mehreenkasana makes two apparently contrary complaints. On the one hand, Netflix offers her content that is nothing like anything she has ever watched. She dismisses one suggestion with the words "I’ve never watched a show in a remotely similar vein." On the other hand, she doesn't see how Netflix can offer her challenging experiences. "Intensely curated experiences, whether you’re looking to explore movies or to meet people to date, remove one of the most critical aspects of a rich experience: risk, as in going out of your comfort zone."

But as @larakiara explains, "personalization is key to ensuring users keep coming back. But there's also the problem of over-personalization, so Netflix has to introduce variants."

Thus we can see Netflix as an embodiment of at least three of @kevin2kelly's Nine Laws of God.

Control from the bottom up
Maximize the fringes
Honor your errors

"A trick will only work for a while, until everyone else is doing it." (Remember Blockbuster.)

Mehreen Kasana, Netflix’s recommendation algorithm sucks (The Outline, 24 March 2017)

Kevin Kelly, Nine Laws of God. Chapter 24 of Out of Control (1994)

Lara O'Reilly, Netflix lifted the lid on how the algorithm that recommends you titles to watch actually works (Business Insider, 26 February 2016)

Janko Roettgers, Netflix Replacing Star Ratings With Thumbs Ups and Thumbs Downs (Variety, 16 March 2017), How Netflix tests Netflix: The story behind the service’s new two-thumbs-up feature (Protocol, 11 April 2022)

Tom Vanderbilt, The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next (Wired, 7 August 2013)

Wikipedia: A/B Testing

Related posts: Competing on Analytics (May 2010), Emergent Similarity (February 2012), The Nature of Platforms (July 2017), Towards the Data-Driven Business (August 2019), Does Big Data Drive Netflix Content? (January 2021)

Architecture, Data and Intelligence

Pages

Saturday, June 03, 2023

Netflix and Algorithms

Friday, January 01, 2021

Does Big Data Drive Netflix Content?

Thursday, June 29, 2017

Rhyme or Reason - The Logic of Netflix

Blog Archive

Creative Commons

or by email

Pages

Saturday, June 03, 2023

Netflix and Algorithms

Friday, January 01, 2021

Does Big Data Drive Netflix Content?

Thursday, June 29, 2017

Rhyme or Reason - The Logic of Netflix

Blog Archive

Creative Commons

Subscribe

or by email