This is an exhaustive analytic report for getting a clear insight on the video game industry, from its very primitive stage to the peak of the video game industry.The data set is from Kaggle
This report answers Questions on Video Game industries. Some of them are Stated below:
which platform has the most game available?
which company has the most profitable Video game?
What share of market is held by companies in different regions?
The Data
r< dim(df) >
. The names of all the columns and their meanings are stated below:-
Atrributes | meanings |
---|---|
Rank | Rank of video game |
Name | Name of video game |
Platform | Platform for which it is developed |
Year | Year of release |
Genre | type of game |
Publisher | Publisher/Developing Company |
NA_Sales | North Amrica total Sales |
EU_Sales | Europe total Sales |
JP_Sales | Japan Sales |
Other_Sales | Sales in all other Countries |
Global_Sales | Global total Sales |
There are missing data in the csv so we have to clean the data and also tidy it.
Data Wrangling is the term collectively given to Data Cleaning And Data Tidying In this process we do the following things :-
Lets get hands on to this:-
First converting all the character into factor so that we can easily implement Statistics modelling function and also it would be handy to use them in plotting libraries like ggplot2
nNw we can see that categorical data are interpreted by R, when we look at the data we see that ‘N/A’ is used for representing NA, if we did not change it, R will not recognize it as a Missing value and we get error prone results.
## Rank Name Platform Year
## Min. : 1 Length:16598 DS :2163 2009 :1431
## 1st Qu.: 4151 Class :character PS2 :2161 2008 :1428
## Median : 8300 Mode :character PS3 :1329 2010 :1259
## Mean : 8301 Wii :1325 2007 :1202
## 3rd Qu.:12450 X360 :1265 2011 :1139
## Max. :16600 PSP :1213 (Other):9868
## (Other):7142 NA's : 271
## Genre Publisher
## Action :3316 Electronic Arts : 1351
## Sports :2346 Activision : 975
## Misc :1739 Namco Bandai Games : 932
## Role-Playing:1488 Ubisoft : 921
## Shooter :1310 Konami Digital Entertainment: 832
## Adventure :1286 (Other) :11529
## (Other) :5113 NA's : 58
## NA_Sales EU_Sales JP_Sales Other_Sales
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00000 Min. : 0.00000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: 0.00000
## Median : 0.0800 Median : 0.0200 Median : 0.00000 Median : 0.01000
## Mean : 0.2647 Mean : 0.1467 Mean : 0.07778 Mean : 0.04806
## 3rd Qu.: 0.2400 3rd Qu.: 0.1100 3rd Qu.: 0.04000 3rd Qu.: 0.04000
## Max. :41.4900 Max. :29.0200 Max. :10.22000 Max. :10.57000
##
## Global_Sales
## Min. : 0.0100
## 1st Qu.: 0.0600
## Median : 0.1700
## Mean : 0.5374
## 3rd Qu.: 0.4700
## Max. :82.7400
##
Now we will check the consistency of the data, whether tha data inside a column is homogeneous or not, or the data inside column is feasible or not.
taking the mean of the differences between the actualSale calculated by summing up Sales from all countries to the Global_Sale Attributes we get
## [1] 0.0002765393
So from here we can see that the Global_sale atrribute is not correct and has some errors in it. Since the value in revenue is in million dollars, so there is significant amount of data which is entered false. Lets change the value of the Global_sale with the sum of Japan sale,North America Sale,Europe Sale and others sale.
The long tail in the graph clearly states that there are only very few games which have total revenue greater then 75.Most probably these are the most popular game, if not so it may be an outlier.Also we have to check for the duplicacy of the data.
## # A tibble: 2,775 x 2
## Name count
## <chr> <int>
## 1 Need for Speed: Most Wanted 12
## 2 FIFA 14 9
## 3 LEGO Marvel Super Heroes 9
## 4 Madden NFL 07 9
## 5 Ratatouille 9
## 6 Angry Birds Star Wars 8
## 7 Cars 8
## 8 FIFA 15 8
## 9 FIFA Soccer 13 8
## 10 Lego Batman 3: Beyond Gotham 8
## # ... with 2,765 more rows
So here we can see that there are 2,775 videogames which are being published more than once, surely these game must have great revenue thats why there are multiple releases.
In the next section we will analysis the trend and try to find the correlations and give answer to various Curious Questions too.
So here we see that how the data is being spread, and its central tendencies to get direct insight of the data
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 9 46 36 17 14 14 21 16 15 17 16 41 43 60 121
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 219 263 289 379 338 349 482 829 775 744 936 1008 1201 1428 1431
## 2010 2011 2012 2013 2014 2015 2016 2017 2020
## 1257 1136 655 546 580 614 342 3 1
The hisogram clearly states that there is an abrupt declination in video game manufacturing from 2012, this also act as an evidence that there are very less jobs for video game developer in 2014. The graph fall abruptly after 2016, this indicates that there must be some kind of problem in data gathering after 2016, the data is inconsistent.We will limit our studies till 2016.
This graphs shows which, which genre have most no. games in it .Action Games are at the tops followed by Sports, here interesting insights is that, miscellaneous game are 3rd highest ranking.
This graphs shows which, which genre have most number of games in it. Action Games are at the tops followed by Sports, here interesting insights is that, miscellenious game are 3rd highest ranking.Lets see distribution of companies developing game in a Genre.
From here we can clearly say that most of the sales come from the North America but if we think from the point of view of Marketing its not a great metric, as we know that Japan has less population so if we incorporate that factor into metric, things may be different.
Name | total_sale |
---|---|
Wii Sports | 82.74 |
Grand Theft Auto V | 55.92 |
Super Mario Bros. | 45.31 |
Tetris | 35.84 |
Mario Kart Wii | 35.82 |
Wii Sports Resort | 33.00 |
Pokemon Red/Pokemon Blue | 31.37 |
Call of Duty: Modern Warfare 3 | 30.83 |
New Super Mario Bros. | 30.01 |
Call of Duty: Black Ops II | 29.72 |
So till 2016 these games have the most global revenue, Wii sports which is a nentando game is on the top and google search link also state this (Quite interesting actually) GrandTheft Auto is on 2 followed by Super Mario.
Genre | total_revenue |
---|---|
Action | 1722.84 |
Sports | 1309.24 |
Shooter | 1026.20 |
Role-Playing | 923.83 |
Platform | 829.13 |
Misc | 789.87 |
Racing | 726.76 |
Fighting | 444.05 |
Simulation | 389.98 |
Puzzle | 242.21 |
Adventure | 234.59 |
Strategy | 173.27 |
Here we can see that Misc genre is on 3 position in number of count of games but in of case revenue generation its far from genre like Action, Sports and all.
Now lets find that which genre has the least number of video games developed in it.
The metric would be like, total revenue generate by a genre divided by the total number of video game in that genre.
Genre | total_revenue | count | ease_metric |
---|---|---|---|
Platform | 829.13 | 875 | 0.9475771 |
Shooter | 1026.20 | 1282 | 0.8004680 |
Role-Playing | 923.83 | 1470 | 0.6284558 |
Racing | 726.76 | 1225 | 0.5932735 |
Sports | 1309.24 | 2304 | 0.5682465 |
Fighting | 444.05 | 836 | 0.5311603 |
Action | 1722.84 | 3251 | 0.5299416 |
Misc | 789.87 | 1686 | 0.4684875 |
Simulation | 389.98 | 848 | 0.4598821 |
Puzzle | 242.21 | 570 | 0.4249298 |
Strategy | 173.27 | 670 | 0.2586119 |
Adventure | 234.59 | 1274 | 0.1841366 |
Some interesting facts: as we can see that Platorm and Shotter game are generating great average revenue per game, action game clearly gets out of top 5 and it clearly shows that there are a lot of games,
Lets find that which Console offers the most number of video games. Ease metric gives us that an Average revenue generated by a Game launched for a console type.
Platform | total_revenue | count | ease_metric |
---|---|---|---|
PS2 | 1233.46 | 2127 | 0.5799060 |
X360 | 969.60 | 1234 | 0.7857374 |
PS3 | 949.35 | 1304 | 0.7280291 |
Wii | 909.81 | 1290 | 0.7052791 |
DS | 818.91 | 2131 | 0.3842844 |
PS | 727.39 | 1189 | 0.6117662 |
GBA | 305.62 | 786 | 0.3888295 |
PSP | 291.71 | 1197 | 0.2437009 |
PS4 | 278.10 | 336 | 0.8276786 |
PC | 254.70 | 938 | 0.2715352 |
GB | 254.42 | 97 | 2.6228866 |
XB | 252.09 | 803 | 0.3139352 |
NES | 251.07 | 98 | 2.5619388 |
3DS | 246.27 | 499 | 0.4935271 |
N64 | 218.21 | 316 | 0.6905380 |
SNES | 200.05 | 239 | 0.8370293 |
GC | 197.14 | 542 | 0.3637269 |
XOne | 141.06 | 213 | 0.6622535 |
2600 | 86.57 | 116 | 0.7462931 |
WiiU | 81.86 | 143 | 0.5724476 |
PSV | 61.60 | 410 | 0.1502439 |
SAT | 33.59 | 173 | 0.1941618 |
GEN | 28.36 | 27 | 1.0503704 |
DC | 15.97 | 52 | 0.3071154 |
The above table clearly indicates that the PS range of console by sony has generated the maximum revenue, so for increasing sales for the video game one must make the video game compatible with top revenue generating console which is large in number.
The plot gives a clear insight of the revenue generated by a specific console.
But taking Revenue as a metric for compatibility may be distorted as it might be possible that bacause of one specific kind of game revenue has increased but the number of Console is less, so comapatiblity priority must be choosen by taking both factors revenue and the count of console.