Exploring Netflix Viewership Trends Using ChatGPT Insights

In a significant move, Netflix has altered its usual practice of keeping viewership data mostly private. The streaming giant recently released a public dataset detailing titles that garnered over 100,000 viewing hours from January to June 2023.

According to Netflix's blog post announcing the report, titled “What We Watched: A Netflix Engagement Report,” the dataset encompasses more than 18,000 titles, representing 99% of Netflix's overall viewing, with nearly 100 billion hours logged. Netflix plans to update this report biannually.

Netflix measures "viewership hours" rather than the number of viewers or households, as some individuals may rewatch titles multiple times.

While Netflix highlighted some findings, I opted to dive deeper into the data by downloading the report as an Excel spreadsheet from its blog. I utilized OpenAI's ChatGPT (with GPT-4 on a personal ChatGPT Plus subscription) to analyze the data.

In short, ChatGPT provided a concise and clear analysis of the dataset, though it faced challenges, especially when generating charts. My initial request was simply for a data analysis, and ChatGPT responded effectively, summarizing the contents accurately.

Additionally, ChatGPT outlined “key insights,” including the notable point that the "Release Date" column has a significant number of missing values (13,359), potentially hampering time-based analyses.

Interestingly, while the first section of key insights was labeled “The Top 10 Most-Watched Titles (Jan-Jun 2023),” it failed to list the titles directly, prompting me to ask for this information separately.

I also requested data on the least viewed titles, the median viewed title, average hours viewed, and the title closest to that average, all of which ChatGPT provided satisfactorily.

However, when I asked for a line plot depicting monthly viewership hours, ChatGPT struggled. The dataset didn’t break down viewership by month; it only offered total viewing hours for each title over six months. The initial plot was nearly illegible, displaying dates from 2010, which corresponded to the earliest release dates in the dataset.

After prompting for corrections, I received a more readable, but still misleading, plot. The chart represented total cumulative viewership hours for new titles released each month rather than monthly viewership totals. For instance, hours viewed for a title released in January encompassed its total viewership across the January-June period.

ChatGPT failed to clarify this distinction on its own; without explicit direction, it labeled the chart inaccurately. Numerous iterations were required before I obtained a properly labeled and useful chart.

While ChatGPT serves as a helpful analysis tool for casual users, there remains significant room for improvement in its reliability and accuracy as a data analyst.

Most people like

Find AI tools in YBX