Video Games

2 downloads 0 Views 1MB Size Report
Data warehouse project on video games. 17/12/16. Contents .... The use of big data in gaming industry The world of gaming is big, growing rapidly and taking ...
Data Warehouse Project On

Video Games

Ashish Aswal x16107985 MSc. In Data Analytics National College of Ireland

Data warehouse project on video games

Contents Approach .................................................................. 2 Introduction .............................................................. 2 Implementation and Architecture ............................. 2 Design Process ........................................................ 6 Extraction, Transform and Load ............................ 11 Data Sources ......................................................... 13 Technologies Used ................................................ 14 

Programming languages ..................... 14



Database Management ....................... 14



Add On Softwares................................ 14

Reporting................................................................ 14 

Tableau ................................................ 14

Case studies .......................................................... 15 

Case study 1 ........................................ 15



Case study 2 ........................................ 16



Case study 3 ........................................ 17

Statistics ................................................................. 18 Constraints ............................................................. 19 Conclusions............................................................ 19 Future Works.......................................................... 19 References ............................................................. 20 Appendix A – Code Snippets Error! Bookmark not defined.

17/12/16

1

Data warehouse project on video games

Approach

My interest in playing video games became the pivot for the project. As an ardent video game fan, I decided to build up a data warehouse using the data available about video games and implement business intelligence on it. Due to vast nature of data related to video games, I restricted my area of analysis to reviews, sales, sentiments and relation between these factors. I started with looking for papers specific to video games and I found some interesting ones. Garnic and Lobel in their paper discuss about positive effects of playing video games such as; increasing cognitive skills, resilience in the face of failure, mood management and increased prosocial behaviors (Garnic & Lobel, et. al. , 2014). Introduction

The gaming market has been continuously spiking as computer and digital devices emerged around our lives from the 1980s till now. The exponential increase in video game companies has led to a boom in players as well. Video games, compared to its first emergence, seem to play a much bigger role in our lives now, and still little do we know about how such a potential medium could influence our thoughts. The games have been developed for various purposes including entertainment, informational purposes, and educational purposes, marketing and advertising purposes. The game industry is growing with more advanced forms of games being developed all over the world. The use of big data in gaming industry The world of gaming is big, growing rapidly and taking full advantage of the Big Data technologies. The gaming industry uses Big Data to drive customer engagement, make more money on advertising and optimize the gaming experience. Implementation and Architecture

This section of the report deals with how the project was implemented and what architecture I used. Data warehouse is a specifically prepared repository of data created to support decision making. (Wixom, 2001).

17/12/16

2

Data warehouse project on video games

Fig. 1 Data warehouse Model

For any data ware house the first step is to gather data to be inserted in it. These data sources can be anything from an operational system to simple flat files. Once we have the data sources, we proceed with the process of ETL (Extraction, Transform and Load). This mainly refers to extracting data from the data source, cleaning and transforming data to be fit for insertion and finally loading the data into data warehouse. OLAP systems can be stored in a relational DW in any of the schema be it star or be it snowflake schema. The fact tables helps in to extract the measures from the records in it and dimension tables give the dimention. These measures and dimensions can further be used for BI applications and reporting. For my project I collected data from three different sources. One from a gaming website, another from open world dataset website and finally from twitter. The data collected was then filtered out to be inserted into the data warehouse. Once I had the data cleaned and transformed, I inserted the data using SSIS into the data base. After inserting the data in database, I moved to SSAS for deploying cube and loading the data in the data warehouse.

17/12/16

3

Data warehouse project on video games

For the project I decided to go with the Kimball approach and star schema as both suited my requirements and were easier to implement within the given time frame. KIMBALL’S APPROACH: Successful implementation of any Data warehousing project depends upon the substantial integration of number of tasks and sub tasks. Following are the key points in Kimball’s approach: 1-Select the business process: For the business process in my project I analyzed the video games over the years from 1980-2015. Since the most important factor affecting the success of any game are the reviews and overall sales, I decided to do the project focusing on these two factors. 2-Declare the grain: Grain in project is the sales of the video game. When I was gathering data, I noticed that sales is the only factor consistent in all the data sources in the gathering phase that prompted me to choose sales as the grain. 3-Identify the dimensions: Kimball states that, the dimensions is what makes a data warehouse project a success as they are the heart of the DW. Dimensions that are used in the project are Game title, Genre, Publisher, Platform, Year, Sales and Sentiments. 4-Identify the facts: In my project the fact table is named Fact which contain the different sentiment scores, sales, critic score and the foreign keys to all the dimensions.

Fig. 2 The Star Schema

17/12/16

4

Data warehouse project on video games

Star schema: It is easier and most widely used approach for developing a datawarehouse project. The star schema derives its name from the physical model resemblance to a star shape that has a fact table in middle, surrounded by one or more dimension tables. The star schema stores the business processes in the fact table as facts. The fact table comprises of all the measures. The columns of a fact table may be given a FOREIGN KEY, which then connects to the dimensions table where actual descriptive data is stored. Fact tables are defined as one of three types:   

Transaction fact Snapshot fact tables Accumulating snapshot tables

Dimensional tables are the ones connected to the fact tables. The amount of data in the dimensional table are relatively small compared to the fact table, but each record can have many attributes in the dimension table. Each dimension is allotted with a primary key to represent itself. In the project the sales, critics score and sentiment score are the prime measures of the fact table. In my project the dimensional tables are Sentiment (The sentiment score of video games), Sales (the sales values), Year( the years in which the games were released), Publisher(Names of popular game publishers by sales), Platform(The platforms for which games are made), Game title(The title of the game and critic score for that game) and Genre(The genre of the game) .

17/12/16

5

Data warehouse project on video games

Design Process

Fig. 3 Kimball's Lifecycle

This section deals with the design aspect of the project. As we have decided to go with Kimball’s approach , we must first discuss the elements of kimball’s data warehouse lifecycle. Some elements are:  

 

  

Project Planning: This refers to scoping of the data warehouse project. It refers to a single iteration of the Kimball’s approach. BI requirement definition: This refers to deciding the business rule and selecting grain of the project. For the BI part there are three tracks, i.e. technology track, data track and BI application track. Dimensional Modelling: This refers to selection of facts and dimensions for proper reporting needs of the project. ETL design and Development: This refers to designing and development of ETL systems, which remains one of the most vexing challenges confronted by a DW project. Deployment: This refers to creation of data source views and dimensions, followed by actual deployment of the cube in data warehouse. Maintenance: This refer to activities and tasks used to maintain the DW once it is operational. Growth: Now that the project is functional, the system is bound to expand and evolve to deliver more values to the business. This refers to proper rules and guidelines that need to be followed to maintain the DW project.

For this project, we first begin with setting up of data base and tables in the database using SQL Server Management Studio. 17/12/16

6

Data warehouse project on video games

Go to SQL SERVER Management studio and click on connect in the dialogue box. After that, right click on the database tab and click NEW. As we can see that I have given the name to the database as “gaming”. Now right click on tables and create new table. Repeat this step until you have all of the dimensions and fact tables integrated with your database.

Fig. 4 Description of database and tables

When we complete this step we proceed to SQL Server Integration services. To begin with SSIS project, first shift to SQL server visual studio and start a new project called integration services project. I have given the name to the project as “gaming_DataWarehouse”. Once we are in the design pane, start connecting the data source to the OLE DB destination. For my project, I had flat files which I imported to the data base.

17/12/16

7

Data warehouse project on video games

Fig. 5 Work Flow Diagram

In this there is a starting script which is the c# script which displays the message “About to start SSIS process” in the starting after clicking OK first the tables are truncated using the truncate table script.

Fig. 6 Data flow diagram (Pre-start)

Then the dimensions are loaded again from the flat file sources to the OLEDB Destinations. The green ticks here shows that the tables are populated successfully. After this the fact table is updated.

17/12/16

8

Data warehouse project on video games

Fig. 7 Data flow diagram (After execution)

Here we can see that, we have received all green ticks that means our dimension tables have been populated. After the population of the dimensions our fact table is also populated. To check that we can run a simple query in SQL Server management studio. As seen below, after executing the query, we get following output. It returned all the records in the fact table. What should be noted in the process is that out of 11445 records in the game title dimension only 3988 records have been inserted into the fact table. These are the records we need for analysis purposes. This was made possible by the use of VLOOUP FUNCTION in Excel.

Fig. 8 Verifying fact table

17/12/16

9

Data warehouse project on video games

After populating all the tables we switch to the sql server analysis services (SSAS) in which first we need to upload data sources, data source views, create all the dimensions and in the end create the cube. We can also define hierarchies and the relationships between the attributes of dimensions in SSAS.

Fig. 9 Setting up SSAS

After this to perform analysis we need to deploy the cube. The below image shows the structure of the cube.

Fig. 10 Cube Deployed

17/12/16

10

Data warehouse project on video games

This image shows that the cube has been deployed successfully and we can browse the cube and perform analysis.

Fig. 11 Verification of cube deployment

Extraction, Transform and Load For any project to start, we need data. The fact that data is not always preintegrated and that we need to find the find suitable to our own needs is a tedious task in itself. This is where the process of ETL comes into place. It is often regarded as the heart of the project. It is the process through which data from different sources (structured, semi structured and unstructured ) is retrieved and transformed into relevant formats (i.e.) unstructured data has to be first normalized and converted into relational database format so that it can be stored and made available for analysis. Transformation part includes cleansing data as data quality is one of the important key factor in data warehousing. According to forbes.com most of the valuable time for a data people are spent on cleansing and organizing data for analysis. This is need to eradicate irrelevant data. Once transformation of data is done it is finally loaded into the final data marts layers where it can be used by the analyst to perform ad hoc analysis and provide insights along with business driven decisions. No wonder it is regarded the backbone of any data warehouse project.

17/12/16

11

Data warehouse project on video games



Extraction: The single most important step of the ETL process. This step required extraction of data. As mentioned earlier I used three data sources. Let us see how the extraction process took place for each one of them. o IGN.COM – This is gaming website from which I extracted data using a code in Python. I specifically scrapped data of the reviews of the game and put them in a flat file after extraction. I used scrapy package, which helped me in deploying spiders over ign.com to extract desired items for the project. The extracted items include game title, critic score, score phrase, platform, genre. The records in the extracted data set were above 18,000 o Kaggle.com – It was probably easier to look for. This website helped me in looking for a specific data set that deal with sales of video games. I found a data set containing a list of video games with sales greater than 100,000 copies. The data had 16598 records in it.

o Twitter.com – As mentioned in the project description that we have to use at least one unstructured dataset. I referred twitter to get sentiments of the video games in the list. I used a code in R to get these sentiments. 

Transform: This step included cleaning of the data and making data fit to be inserted into the database. As we can see that data was fetched from different sources, it had be consistent to be used effectively. I used Vlookup function in excel to find common games in both the data sets and got sentiments for these games only. I also used google refine to clean up messy data for the project. Once I was ready with my consistent data, I proceeded with loading the data into the database.



Load: Now that I am ready with the data to be loaded in the data warehouse. I can proceed with the steps mentioned above from setting up databases till deployment of the cube.

17/12/16

12

Data warehouse project on video games

Data Sources The selection for various data sources was the beginning and most time consuming step for the project. I manage to get three different data source, two being structured and one being unstructured.



IMAGINE GAMES NETWORK (IGN) - Used web scrapping to get data from gaming website IGN. Used Scrapy framework in python to scrape data from the website. Scrapy enabled implementation of spiders for a particular url that result in proper scraping of data. The data generated was in json format, which was later converted into csv. The dataset had over 18,000 records. URL: https://ie.ign.com/game/reviews

Fig. 12 Code snippet for scraping data



I used another structured dataset from kaggle.com . This is a dataset that contains list of all the games selling over 100000 copies. The data set had over 16,500 records. URL: https://www.kaggle.com/gregorut/videogamesales



Twitter sentiments: Unstructured dataset, used to generate the sentiment score for all the games. Used the R-code for getting the sentiment score.

17/12/16

13

Data warehouse project on video games

Technologies Used 

Programming languages o R – Used code to get tweets from twitter and generate sentiment score for the games played. o Python – Used code to scrape data from a gaming website. Included Scrapy package. o SQL – Used to write queries for creating dimension and fact tables in SQL server management studio.



Database Management o SQL Server Management studio – Used for creating and store databases and tables in it. o Visual studio –Used to populate database with data and to deploy the cube.



Add On Softwares o Tableau – Reporting software, used representation of the case studies. o Snagit – Used for recording the video. o Google refine – Used to clean messy data.

for

creating

visual

Reporting

After deployment of the cube in SSAS, the next phase of the project is the analysis of the data that is inserted in the data warehouse. For this I have decided to go with Tableau software. It helps in creating visual representation



Tableau

Tableau is a business intelligence and analytical tool provided by the tableau software company. Using tableau we can create visual representation of our analysis and send the findings for publishing. Its perfect user interface makes it more reliable and easy to use. . It is also providing free subscription for students for an year.

17/12/16

14

Data warehouse project on video games

Fig. 13 Exporting cube to tableau

Here we can see that I have exported my cube in tableau, which is evident by the cube symbol and the name beside the arrow. This has now loaded the fact table itself in tableau and I can use all the dimensions and measures to perform operations for my case studies. Case studies  Case study 1 The first case study compares platforms with major share in video games sales over the years. We can see that console based platforms accounted for majority of sales over hand held platforms and even Personal computers. Wolf states that playstation targeted right audience that resulted in better sales and better profile for Sony brand. He also states that console bridged the difference in pc and handheld platforms, as they were more powerful than handheld platforms while they were more portable than PC. (Flynn & Palma, 2007) In recent years (2013 onwards) as well, play station 4 and Xbox have the most sales than all other platforms combined. This clearly states that console based games generate more sales.

17/12/16

15

Data warehouse project on video games

Fig. 14 Case study 1

 Case study 2 The second case study is a comparison of genre, to check which genre is successful or most liked genre in terms of sale. We can see that sale is maximum for action genre, which is significant from other genres. Boot and blakely in their paper concluded that action games help in increasing perception and cognitive skills in players. Among other benefits, action games also help in motivating in facing failures, increased prosocial behaviors and also reduces cost of switching tasks. (Cain & Landau, 2012)

17/12/16

16

Data warehouse project on video games

A survey was conducted by Franceschini and Gori in 2013, on children having dyslexia. They made the children play action video games and concluded that children playing action video games read better than children who did not play those games.

Fig. 15 Case study 2

 Case study 3 In the final case study, I compared the sales with critic and sentiment score. I concluded that; more the critic score, better the sales and vice versa. While there is no particular correlation between sentiment score and sales. People’s perception on twitter is not a factor in the sale of video games. Even the games with negative score have a few games having highest sales.

Fig. 16 Case study 3

17/12/16

17

Data warehouse project on video games

Statistics I ran statistics to validate the findings in my third case study; where the results showed that more critic score resulted in better sales and that sentiments score is irrelevant in sales. I conducted Pearson’s two tailed correlation test with sales, critic score and sentiment score from the fact table as the variables.

Fig. 17 Pearson's Correlation test

The results were consistent with the findings of the case study, here we see that sales is positively correlated with critic score. The Pearson correlation score is 0.259 which indicates that sales increases with an increase in critic score. We also see that Pearson correlation score for sentiment score is -.008 which states that there is a negative correlation but it is very weak and it does not affect the sales.

17/12/16

18

Data warehouse project on video games

Constraints Following things should be kept in mind to complete the project successfully.    

Start the project on time, do not wait for deadlines Always save your work in ssis and ssas. Rely on good data sources, this will result in less efforts in cleaning and manipulation of data. Use plugins only when the data is in tabular form.

Conclusions

The data ware house for video games was successfully setup and integrated with sql server. The cube upon deployment was exported to tableau. Tableau was used to get visual representation of the analysis of all case studies. For verifying the outcomes of one case study I used Pearson’s Correlation statistical test.

Future Works

I can scale up the project by adding more dimensions to the data warehouse and adding more games. New technologies are coming up as in virtual reality and augmented reality, which is still in naïve stage in gaming industry. I can incorporate these into the data warehouse to get better outcomes and do predictive analysis on video games. I can also work on gathering more data and to analysis on gaming pattern on the basis of countries.

17/12/16

19

Data warehouse project on video games

References

Academic texts referred in the report are mentioned here. 

General Report

Kimball, R., & Ross, M. (2011). The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons. Shaffer, D.W., Halverson, R., Squire, K.R. and Gee, J.P., 2005. Video Games and the Future of Learning. WCER Working Paper No. 2005-4. Wisconsin Center for Education Research (NJ1). Granic, I., Lobel, A. and Engels, R.C., 2014. The benefits of playing video games. American Psychologist, 69(1), p.66. Wixom, B. H., & Watson, H. J. (2001). An empirical investigation of the factors affecting data warehousing success. MIS quarterly, 17-41 

Case study 1

Wolf, M.J., 2008. The video game explosion: a history from PONG to Playstation and beyond. ABC-CLIO. Flynn, S., Palma, P. and Bender, A., 2007. Feasibility of using the Sony PlayStation 2 gaming platform for an individual poststroke: a case report. Journal of neurologic physical therapy, 31(4), pp.180-189. Banerjee, S. (2006), “Video game sales up 6% in 2005 – MarketWatch”, available at: www. marketwatch.com/story/video-game-industry-grows-6-in-2005



Case study 2

Boot, W.R., Blakely, D.P. and Simons, D.J., 2011. Do action video games improve perception and cognition?. Frontiers in psychology, 2, p.226. Franceschini, S., Gori, S., Ruffino, M., Viola, S., Molteni, M. and Facoetti, A., 2013. Action video games make dyslexic children read better. Current Biology, 23(6), pp.462-466. Cain, M.S., Landau, A.N. and Shimamura, A.P., 2012. Action video game experience reduces the cost of switching tasks. Attention, Perception, & Psychophysics, 74(4), pp.641-647. 17/12/16

20

Data warehouse project on video games

Peever, N., Johnson, D. and Gardner, J., 2012, July. Personality & video game genre preferences. In Proceedings of the 8th australasian conference on interactive entertainment: Playing the system (p. 20). ACM

17/12/16

21