Cranes at CVR Q22
NLP Analysis on IMDb API Movie Metadata
Problem Statement:
You are given an IMDb ID as input. Use the IMDb236 API to retrieve metadata and apply Natural Language Processing (NLP) techniques to understand the textual information provided in the movie data.
Part A – API Setup
- Prompt the user to enter an IMDb ID (e.g.,
tt1877830).
- Make a GET request to:
https://imdb236.p.rapidapi.com/imdb/{imdb_id}
- Ensure you have your
X-RapidAPI-Key and X-RapidAPI-Host in the request headers.
- Store the full JSON response and proceed only if the key fields exist.
Part B – NLP Tasks on Various Text Fields
- description:
- Clean the description using regex (retain only alphabets).
- Convert to lowercase, tokenize, remove stopwords.
- Lemmatize the words.
- Find and display the top 5 most frequent words.
- interests (list of keywords):
- Join into a single string and treat it as a document.
- Use TF (term frequency) to find dominant thematic words.
- Compare overlap between description and interests (intersection).
- genres (e.g., Action, Crime, Drama):
- Convert to lowercase and check if genre terms appear in description.
- Count matches and list missing genre mentions (e.g., a movie listed as “Drama” but “drama” doesn’t occur in the description).
- cast fullName and characters:
- Extract all character names and actor names.
- Use named entity pattern matching (NER-style) to classify real person vs. fictional name (if feasible).
- Check for how many character names are mentioned in the description.
- writers and directors:
- Use lemmatization to group similar writer/director roles (e.g., write, writer, written).
- Generate a word cloud or frequency bar plot of creator names (first names, surnames).