This project is uncovering data from one of the world’s biggest online movie databases, launched back in 1990. This movie database contains more than 3,9 milion movie/tv-series titles and 7,5 million personalities. We will throughout this paper try to uncover some patterns, tendencies and relationships between different variables in the movie landscape. Some of these variables are budget, revenue, duration of the movies, genres, year, etc. Although IMDb has millions and millions of titles for us to progress, we don’t have the resources and time to go through all that data. We have also gathered moviescripts for text analysis, to create and look at patterns. Some of the questions we want to uncover are, how duration affects the movies success and gross? Which kinds of movies has the biggest gross? Is there a connection between gross and budget? Is there a connection between the amount and type of feelling and the success of a movie? etc.
The Movie network below is created to see if centrality is based on rating. The top 5 nodes with the highest centrality in the network show that this is not the case. High centrality is not based on rating but instead, tell that the movie 'The Dark Knight Rises' have a lot of actors who are also in many other films. So this tells us that the film on the top 5 list is either has some very famous/very productive actors
The way network is constructed is, where the movies are the nodes and the connection is the actors, so if 3 of the same actors appears in two different movie, then there is a connection between these two movies.
We only take the movies in the middle of the cluster, to get a better overview. So we get rid of every movie that has no connection to other movies.
The betweenness centrality of the top 5 nodes, tells us about how many times a node acts as a bridge between other nodes, as the shortest path cross to nodes. The idea was to see if a node with high centrality has something to do with the rating the movie got. The eigenvector centrality shows a nodes influence within a network, this list would probably show one or more of the same movies as the betweenness centrality.
Movie | Betweenness Centrality |
---|---|
The Dark Knight Rises | 0.10054201121 |
The Fighter | 0.0455517310483 |
Maleficent | 0.0402527902084 |
You Don't Mess with the Zohan | 0.0381964305564 |
The Departed | 0.0325438477142 |
Movie | Eigenvector Centrality |
---|---|
Grown Ups 2 | 0.27735964737 |
You Don't Mess with the Zohan | 0.267864857114 |
Bedtime Stories | 0.256616483492 |
The Ridiculous 6 | 0.25587153537 |
Pixels | 0.249395787647 |
The movies in these have famous actors or just a substantial number of actors that result in their centrality of the network. As can be seen from this list a movie's rating has nothing to do with the centrality or how important a node is in the network.
This network below is created to see if actors tend only to do movies in the same genre. The connection is based on if an actor has done a movie with other actors.
The communities in this network were identified with the use of Louvain community detection algorithm which finds communities in a graph/network, communities being actors who are closely connected with each other. This algorithm found 16 communities in our network. The idea was to see if there is a tendency for actors only to make movies within a specific genre.
From the communities in the network, it shows us that this could be the case as a lot of communities only do the same genre movies and if they too other genres this is just a tiny portion.
This graph shows that there are a connection between the revenue and the budget. Most of the movies have a budget between 0 and 80 million and revenue at almost a maximum of 500 million. Some of the movies are differentiating as example a movie with a budget of 8 million and revenue 800, which is 10 times the budget. So the highest revenue at this graph is max 10 times the budget.
This graph shows the connection between the average vote and the budget. The movies have a vote average between 3 and 8. We can conclude that movies with a budget between 0 and 50 million have a vote average of 4 and 8. There is a tendencies that movies with a high budget have higher votes.
Here, the graph illustrates the vote average and the revenue in millions. Movies with higher votes get a high revenue. When movies has a revenue at minimum 500 million, they will get at least a vote average of 5. The movies with the highest revenue (2700 million) has a vote average of 7.
This graph shows vote average and the duration of movies in minutes. We can conclude that most movies with a duration over 150 minutes has higher vote average, whereas some movies with a duration less than 150 minutes has a vote average of 4 and less. Mos movies has a duration around 100 minutes, where the vore averages are spread almost equally in that duration. As said before the spread towards a high vote average happens when movies reaches 150 minutes.
This graph shows duration in minutes and revenue in million ($) for the movies. Most of the movies has a revenue between 0 and 400 millions $ and the duration is between 80 and 150 minutes.
This graph shows the average revenue for movies in months. We can conclude that movies in May and June has the highest revenue, which can have something to do with spring break and that most movies are released at this time, in order to do something with the family in freetime. The revenue is also high at the late months, in November and December, which can have something to do with the special genre of the season, where a lot of christmas movies are released.
This graph shows revenue in average for different movie genres. The graph illustrates that animation movies and horror movies has the highest revenue of 650%. The movies with the lowest revenue average is action movies, with an average revenue of 250% of the budget. Action movies are also more expensive to produce in comparison to example horror movies.
The graph shows an overview of how many movies make it to the top IMDB top 250 movies list. The last decade 3-7 movies have made it to the top so if there is already seven movies that made it to the top wait until the next year for best chances to make it to this list.
The idea is to calculate the sentiment values of some top movies and some bottom movies. We want to see if a movie's rating is based on the sentiment of the manuscript. There will also be calculate the sentiment value of movies released over the years, and this will show if older movies have a better sentiment value then movies released these days.
This graph illustrates the top 5 movies from left to right, whereas the bottom 5 with the least scores can be seen from right to left. The bottom 5 movies has more sentiment compared to the top 5, which can be concluded as if, much sentiment doesn’t have any influence on movies success.
This graph gives an overview of the sentiments of movies from 1920 to 2020. We can conclude that the sentiment peaks in the periode around 2010. The sentiments has been very average through most years, between 5,4 and 5,5.
Movie | Sentiment (Average) | Year |
---|---|---|
Singin' in the Rain | 5.7 | 1952 |
WALL E | 5.7 | 2008 |
Hachi: A Dog's Tale | 5.7 | 2009 |
Singin' in the Rain | 6.0 | 2010 |
Mad Max: Fury Road | 7.0 | 2015 |
As seen above, there is a list with the top 5 most movies with best sentiment, where the first in the row is Mad Max with a score at 7.0. In second place is Singin’ in the rain with 6.0 in score.
Movie | Sentiment (Average) | Year |
---|---|---|
Rashomon | 5.2 | 1950 |
The Thing | 5.2 | 1982 |
Platoon | 5.3 | 1986 |
A Separation | 5.3 | 2011 |
Amadeus | 5.3 | 1984 |
Here is a list with the buttom 5 movies with worst sentiment, where the first in the row is Rashomon with a score at 5.2.
These text analysis tells us, which words that are the most common in specific genres. An example is in drama movies, where words like “thank”, “say”, “room” and “hold” are the most common. Another example is in action movies, where words like “number”, “team”, “one”, “look time” are common.
The overview of the movie is very common in every movie genre. The words found here find, one, life, world, year and family which are all positive words.