logo

0.1 Whats its all About?

This is an exhaustive analytic report for getting a clear insight on the video game industry, from its very primitive stage to the peak of the video game industry.The data set is from Kaggle

This report answers Questions on Video Game industries. Some of them are Stated below:

0.1.1 Introducing Our Source The Data

The data is in Csv(Comma Separated) format,the dimensions are r< dim(df) >. The names of all the columns and their meanings are stated below:-
Atrributes meanings
Rank Rank of video game
Name Name of video game
Platform Platform for which it is developed
Year Year of release
Genre type of game
Publisher Publisher/Developing Company
NA_Sales North Amrica total Sales
EU_Sales Europe total Sales
JP_Sales Japan Sales
Other_Sales Sales in all other Countries
Global_Sales Global total Sales

There are missing data in the csv so we have to clean the data and also tidy it.

0.1.2 Data Wrangling

Data Wrangling is the term collectively given to Data Cleaning And Data Tidying In this process we do the following things :-

  • Check data Consistency,duplicates
  • Check for Missing Data
  • Check For Outlines
  • Find a strong reason before removing Outliers
  • Fill the Missing Values
  • Fill the the corrupted Data with proper data
  • Feature Engineering-process of making new Features

Lets get hands on to this:-

First converting all the character into factor so that we can easily implement Statistics modelling function and also it would be handy to use them in plotting libraries like ggplot2

nNw we can see that categorical data are interpreted by R, when we look at the data we see that ‘N/A’ is used for representing NA, if we did not change it, R will not recognize it as a Missing value and we get error prone results.

##       Rank           Name              Platform         Year     
##  Min.   :    1   Length:16598       DS     :2163   2009   :1431  
##  1st Qu.: 4151   Class :character   PS2    :2161   2008   :1428  
##  Median : 8300   Mode  :character   PS3    :1329   2010   :1259  
##  Mean   : 8301                      Wii    :1325   2007   :1202  
##  3rd Qu.:12450                      X360   :1265   2011   :1139  
##  Max.   :16600                      PSP    :1213   (Other):9868  
##                                     (Other):7142   NA's   : 271  
##           Genre                             Publisher    
##  Action      :3316   Electronic Arts             : 1351  
##  Sports      :2346   Activision                  :  975  
##  Misc        :1739   Namco Bandai Games          :  932  
##  Role-Playing:1488   Ubisoft                     :  921  
##  Shooter     :1310   Konami Digital Entertainment:  832  
##  Adventure   :1286   (Other)                     :11529  
##  (Other)     :5113   NA's                        :   58  
##     NA_Sales          EU_Sales          JP_Sales         Other_Sales      
##  Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.00000   Min.   : 0.00000  
##  1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: 0.00000  
##  Median : 0.0800   Median : 0.0200   Median : 0.00000   Median : 0.01000  
##  Mean   : 0.2647   Mean   : 0.1467   Mean   : 0.07778   Mean   : 0.04806  
##  3rd Qu.: 0.2400   3rd Qu.: 0.1100   3rd Qu.: 0.04000   3rd Qu.: 0.04000  
##  Max.   :41.4900   Max.   :29.0200   Max.   :10.22000   Max.   :10.57000  
##                                                                           
##   Global_Sales    
##  Min.   : 0.0100  
##  1st Qu.: 0.0600  
##  Median : 0.1700  
##  Mean   : 0.5374  
##  3rd Qu.: 0.4700  
##  Max.   :82.7400  
## 

Now we will check the consistency of the data, whether tha data inside a column is homogeneous or not, or the data inside column is feasible or not.

taking the mean of the differences between the actualSale calculated by summing up Sales from all countries to the Global_Sale Attributes we get

## [1] 0.0002765393

So from here we can see that the Global_sale atrribute is not correct and has some errors in it. Since the value in revenue is in million dollars, so there is significant amount of data which is entered false. Lets change the value of the Global_sale with the sum of Japan sale,North America Sale,Europe Sale and others sale.

The long tail in the graph clearly states that there are only very few games which have total revenue greater then 75.Most probably these are the most popular game, if not so it may be an outlier.Also we have to check for the duplicacy of the data.

## # A tibble: 2,775 x 2
##    Name                         count
##    <chr>                        <int>
##  1 Need for Speed: Most Wanted     12
##  2 FIFA 14                          9
##  3 LEGO Marvel Super Heroes         9
##  4 Madden NFL 07                    9
##  5 Ratatouille                      9
##  6 Angry Birds Star Wars            8
##  7 Cars                             8
##  8 FIFA 15                          8
##  9 FIFA Soccer 13                   8
## 10 Lego Batman 3: Beyond Gotham     8
## # ... with 2,765 more rows

So here we can see that there are 2,775 videogames which are being published more than once, surely these game must have great revenue thats why there are multiple releases.

In the next section we will analysis the trend and try to find the correlations and give answer to various Curious Questions too.

0.1.3 Univariate

So here we see that how the data is being spread, and its central tendencies to get direct insight of the data

0.1.3.1 Yearly Increase in Videogame Development

## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 
##    9   46   36   17   14   14   21   16   15   17   16   41   43   60  121 
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 
##  219  263  289  379  338  349  482  829  775  744  936 1008 1201 1428 1431 
## 2010 2011 2012 2013 2014 2015 2016 2017 2020 
## 1257 1136  655  546  580  614  342    3    1

The hisogram clearly states that there is an abrupt declination in video game manufacturing from 2012, this also act as an evidence that there are very less jobs for video game developer in 2014. The graph fall abruptly after 2016, this indicates that there must be some kind of problem in data gathering after 2016, the data is inconsistent.We will limit our studies till 2016.

0.1.3.2 Genre wise No. of Game Developed

This graphs shows which, which genre have most no. games in it .Action Games are at the tops followed by Sports, here interesting insights is that, miscellaneous game are 3rd highest ranking.

This graphs shows which, which genre have most number of games in it. Action Games are at the tops followed by Sports, here interesting insights is that, miscellenious game are 3rd highest ranking.

0.1.3.3 game Genre distribution on Countries

Lets see distribution of companies developing game in a Genre.

0.1.3.4 Sales Country wise Analysis

From here we can clearly say that most of the sales come from the North America but if we think from the point of view of Marketing its not a great metric, as we know that Japan has less population so if we incorporate that factor into metric, things may be different.

0.2 Get Some insights

0.2.1 Top 10 Revenue generating Games

To do the left work and mark things together:
Name total_sale
Wii Sports 82.74
Grand Theft Auto V 55.92
Super Mario Bros. 45.31
Tetris 35.84
Mario Kart Wii 35.82
Wii Sports Resort 33.00
Pokemon Red/Pokemon Blue 31.37
Call of Duty: Modern Warfare 3 30.83
New Super Mario Bros. 30.01
Call of Duty: Black Ops II 29.72

So till 2016 these games have the most global revenue, Wii sports which is a nentando game is on the top and google search link also state this (Quite interesting actually) GrandTheft Auto is on 2 followed by Super Mario.

0.2.2 Top 5 Revenue Generating Genres

For a Game developer finding the sweet spot is important to make revenue in such a competitive market. Lets first find which genre generates maximum revenue and after that we will find that which genre has least competition that is, total revenue divided by total number of video game companies making games on that genre.
Genre total_revenue
Action 1722.84
Sports 1309.24
Shooter 1026.20
Role-Playing 923.83
Platform 829.13
Misc 789.87
Racing 726.76
Fighting 444.05
Simulation 389.98
Puzzle 242.21
Adventure 234.59
Strategy 173.27

Here we can see that Misc genre is on 3 position in number of count of games but in of case revenue generation its far from genre like Action, Sports and all.

0.2.3 Sweet Spot!!

Now lets find that which genre has the least number of video games developed in it.

The metric would be like, total revenue generate by a genre divided by the total number of video game in that genre.

Genre total_revenue count ease_metric
Platform 829.13 875 0.9475771
Shooter 1026.20 1282 0.8004680
Role-Playing 923.83 1470 0.6284558
Racing 726.76 1225 0.5932735
Sports 1309.24 2304 0.5682465
Fighting 444.05 836 0.5311603
Action 1722.84 3251 0.5299416
Misc 789.87 1686 0.4684875
Simulation 389.98 848 0.4598821
Puzzle 242.21 570 0.4249298
Strategy 173.27 670 0.2586119
Adventure 234.59 1274 0.1841366

Some interesting facts: as we can see that Platorm and Shotter game are generating great average revenue per game, action game clearly gets out of top 5 and it clearly shows that there are a lot of games,

0.2.4 Facts on Consoles!!

Lets find that which Console offers the most number of video games. Ease metric gives us that an Average revenue generated by a Game launched for a console type.

Platform total_revenue count ease_metric
PS2 1233.46 2127 0.5799060
X360 969.60 1234 0.7857374
PS3 949.35 1304 0.7280291
Wii 909.81 1290 0.7052791
DS 818.91 2131 0.3842844
PS 727.39 1189 0.6117662
GBA 305.62 786 0.3888295
PSP 291.71 1197 0.2437009
PS4 278.10 336 0.8276786
PC 254.70 938 0.2715352
GB 254.42 97 2.6228866
XB 252.09 803 0.3139352
NES 251.07 98 2.5619388
3DS 246.27 499 0.4935271
N64 218.21 316 0.6905380
SNES 200.05 239 0.8370293
GC 197.14 542 0.3637269
XOne 141.06 213 0.6622535
2600 86.57 116 0.7462931
WiiU 81.86 143 0.5724476
PSV 61.60 410 0.1502439
SAT 33.59 173 0.1941618
GEN 28.36 27 1.0503704
DC 15.97 52 0.3071154

The above table clearly indicates that the PS range of console by sony has generated the maximum revenue, so for increasing sales for the video game one must make the video game compatible with top revenue generating console which is large in number.

The plot gives a clear insight of the revenue generated by a specific console.

But taking Revenue as a metric for compatibility may be distorted as it might be possible that bacause of one specific kind of game revenue has increased but the number of Console is less, so comapatiblity priority must be choosen by taking both factors revenue and the count of console.