Netflix Content Analysis

Jun 2024

Platform Link
Tableau Netflix Analysis on Tableau
GitHub Netflix Analysis on GitHub

Conducted an in-depth analysis of Netflix’s content library and user ratings to identify patterns and trends in viewer preferences and what makes a title successful. Key objectives included determining the most popular genres and target age groups across continents, analyzing the performance of original content versus licensed content, and identifying the characteristics of highly-rated titles. The analysis also investigated the distribution of content by release year and the impact of star ratings and vote counts on viewership.

Utilized Python pandas library for data cleaning, preprocessing, and transforming the data, ensuring the data was structured and free of inconsistencies. Employed exploratory data analysis techniques to uncover insights and trends within the dataset. Created visualizations using Plotly, Seaborn, and interactive dashboards using Tableau, enabling dynamic data exploration and presentation of findings. Scraped Wikipedia using Python to identify Netflix Originals, allowing for a boolean classification of titles in the dataset.

NetflixDashboard

Netflix

Netflix

Key insights derived from the analysis include:

Skills: Python · Seaborn · Tableau · Data Analytics · SQLite · Web Scraping

Summary of Steps (Python)

Data Loading and Initial Inspection

Data Cleaning

Data Transformation

Vote Statistics Analysis

Additional Dataset Handling

Data Export

Wordcloud on Descriptions

Exploring Correlations

Example SQL Queries

TOP COUNTRIES BY PRODUCTION COUNT

SELECT production_countries, 
count(*) As tv_shows_count
FROM titles
GROUP By production_countries
ORDER BY tv_shows_count DESC;

COUNT WHERE PRODUCTION IS ONLY IN GB

SELECT COUNT(DISTINCT titles.id) AS tv_shows_count, production_countries 
FROM titles
LEFT JOIN credits ON titles.id = credits.id
WHERE titles.production_countries = "['GB']"
GROUP By production_countries;

COUNT WHERE PRODUCTION LIST INCLUDES GB

SELECT COUNT(DISTINCT titles.id) AS tv_shows_count_GB
FROM titles
LEFT JOIN credits ON titles.id = credits.id
WHERE titles.production_countries LIKE '%GB%';

DISTRIBUTION OF TITLES BY RELEASE YEAR

SELECT release_year, COUNT(DISTINCT titles.id) AS count 
FROM titles
LEFT JOIN credits ON titles.id = credits.id 
GROUP BY release_year
ORDER BY release_year;

MOST APPEARING ACTOR

WITH distinct_actors AS (
    SELECT DISTINCT titles.id, credits.name
    FROM titles
    LEFT JOIN credits ON titles.id = credits.id
    WHERE credits.role = 'ACTOR'
)
SELECT name, COUNT(*) AS appearance_count
FROM distinct_actors
GROUP BY name
ORDER BY appearance_count DESC
LIMIT 10;

AVG IMDB SCORE FOR MOVIES AND SHOWS

WITH distinct_titles AS (
    SELECT DISTINCT titles.id, titles.type, titles.imdb_score
    FROM titles
    LEFT JOIN credits ON titles.id = credits.id
)
SELECT type, AVG(imdb_score) AS avg_imdb_score
FROM distinct_titles
GROUP BY type;