How news influence capital markets

1 downloads 0 Views 3MB Size Report
first turn of election worldwide markets suffered a price rise. Otherwise, the .... Terminal High Altitude Area Defense (THAAD) missile defense system. With China.
How news influence capital markets A quantitative analysis about North Korea using Big Data

Student Alberto Pagnin, 776773

Course Big Data & Cybersecurity

Teachers Marco Albertini, Giampiero Giacomello

Department of Political and Social Sciences University of Bologna

2

Abstract

Object: This work examines the relation between North Korea’s recent facts and some south korean and american equities during May 2017. Recently, North Korea has given rise to much debate due to its ballistic and nuclear tests, and its hacking attacks. In order to conduct this research, I used Gdelt’s dataset, one of the largest and highest resolution open database of human society. I used a second dataset, Yahoo Finance, by which I got daily shares price about some companies I considered more interested in this geopolitical conflict than others. Gdelt dataset has enabled the extraction of news about North Korea, South Korea and the United States. Thanks to some indexes already in this dataset, I calculated a correlation for each case between these two phenomena.

Keywords: big data, social sciences, North Korea, South Korea, United States, Gdelt project, Yahoo Finance, stock market prices, financial assets, news media.

3

4

Index

1.

Introduction

7

2.

Facts between United States, South Korea and North Korea

9

2.1 Countries’ trade 3.

4.

Methods, data & analytical approach

10 13

3.1 Datasets

13

3.2 Data analysis

14

Results and conclusion

18

4.1 Gdelt Results

18

4.2 Gdelt and Yahoo results

20

4.3 United State’s ratings

21

4.4 South Korean ratings

23

5.

Sitography

26

6.

Table of indexes

27

5

6

1.  Introduction

We take for granted the existence of an interaction which takes place between stock markets and economic, political and industrial events. It is what happens every day, some phenomena take more attention from specific organizations or indexes than others. In the case of scandals, terrorist attacks, thefts, and other afflicted phenomena there will be rebounds for someone, in the case of positive events, possible changes for someone else. One of the most popular and actual example regards the last french presidential election, won by Emmanuel Macron, the driver of the liberalism party of France. Just after the result of the first turn of election worldwide markets suffered a price rise. Otherwise, the election of Le Pen (according to the program of the party), would have led to “Frexit”, an hypothetical french withdrawal from the European Union. Given the increasing popularity of the Front National electorate, the election of Ms. Le Pen was saw as an important turning point for the France, and according to some reviews, even more critical than the Brexit. According to The Guardian, despite would have been hard to realize, the plan of Front National set alarm bells to Brussels (Hanley, 2017). Following the ant-EU Front National leader’s plan, one of the main topic of campaign sees France restoring the franc: this would have generated caos on markets. Si this had been one of the reason that Macron’s winning exerted a rise on worldwide stock markets (Dagospia, 2017). The consequences of this political fact have also reached a country far from France as South Korea. On the day of the election and on the following ones, there have been good performances also in the south-korean stock market which has positively gone up with a +2.3% of the election of the new French president, due to investors’ sense of hope in the aftermath of South Korean elections after 30 years of dictatorship and last months of institutional paralysis ended in the arrest of Park Geun-hye, former South Korean president investigated in corruption and abuse of authority. In the next chapters I explored the historical-economic aspects between North Korea, South Korea and the United States. In order to analyse if, even in this case, there is a relation among markets and North Korea’s international events, I conducted an accademic research, even thought limited, including a small sample of South Korean and US stock indexes. Regarding news and events, I used Gdelt’s data, one of the largest open-access spatio temporal datasets in existence about human society. Concerning financial aspects I downloaded indexes data from Yahoo Finance and then compared with the Gdalt ones.

7

8

2.  Facts between United States, South Korea and North Korea Before deal with the main topic of this work, I decided to shed light on what is happening in the Far East and why North Korea is attracting so much media attention. United States and North Korea are living a conflict from more than 70 years. Between 1950 and 1953 took place the Korean War, an huge conflict which caused almost 3 millions of victims. Occurred on the South Korea ground, the main actors involved in the war were the the Democratic People’s Republic of Korea (North Korea, PRK), supported by China and the Soviet Union, and from the other side by the Republic of Korea (South Korea, KOR) helped by USA. The current political establishment was created in these circumstances: one side composed by United States in support of South Korea, another one by China besides North Korea, even if in the latter case an increasingly open fracture has recently emerged. According to Wikipedia, the number of ballistic tests directed by Kim Jong Un, supreme leader of Democratic People's Republic of Korea (DPRK), is increased from 2016 to 2017, in order to demonstrate the military competition which his country could hold on. There are several sanctions from United Nations’ council to stop PRK’s ballistic tests aiming to cool down. According to NBC News, threatened by the possibility of being attacked, the United States Secretary of Defence, James Mattis, proposed new saber-rattling sanctions to Korea without discarding a negotiation that would foresee a denuclearization of the North Korean Peninsula – maybe an unattainable solution given Korea’s determination – (Smith, 2017). The Chinese and United Nations stands are diplomatic, China and United Nations welcomes the path of dialogue. There are additional issues that fuel the fight between parts, so, history helps to understand more about the present. Broadcast technology, big data, remote controls, cloud data storage, are both great opportunities but also potential threats that businesses need to know. North Korea fuels the conflict between countries throwing gas on fire through cyber attacks. One first known fact, concerns Sony Pictures and its IT infrastructure. According to Wikipedia, In 2014, the group behind the Sony's cyber attack, identified under the name of Lazarous Group, subtracted parts of a Sony’s movie telling the death of Kim Jong Un. North Korea is famous for cyber attacks occurred in recent years, where hacker groups attacked Polish, American banks, including the $81Mln stolen from the Bangladeshi bank. Another

9

meaningful attack, amenable to Lazarous group, is the Ramsonware1 widespread in early 2017, which, as reported The Guardian, attacked global computers as the Britain’s health service, companies in Spain, Russia and others. (Hern, Gibbs, 2017). However, how suggested by Reuters, in the United States prevails some optimism, despite the recent cyber-attacks of May 2017, which are still raising cybersecurity titles firms (Campos, 2017) with the attention that seems to be distracted by internal tensions linked to the removal of FBI chief James Comey, with the launch of a new missile by North Korea.

2.1 Countries’ trade

Any kind of repercussion on a possible war will depend on who does what. By abandoning for a moment political issues, an economic approach opens the door to a deeper consideration of the dynamics that may affect the decision-making of attacks. North Korea (PRK) is one of the most politically and commercially self-sustaining countries: around 80-90% of its origins and economic destinations are with China, the first neighboring country (Source: OEC).

Fig. 1 Destinations of PRK trade in 2015.

1

Ransomware is a type of malicious software that blocks access to the victim's data or threatens to publish or delete it until a ransom is paid.

10

This economic solidity with China supports and strengthens the leadership between the two countries. According to CNN, the biggest source of foreign currency is believed to come from the millions of tons of coal that every yea North Korea sells to China (Mullen, 2017). Another important Korean income has given by hacking organizations like banks. The country is now being linked to attacks to attacks on financial institutions in 18 countries, according to a report from Russian cybersecurity firm Kaspersky.

Fig. 2 Origins of PRK trade in 2015.

On the other hand, with the second neighboring state, South Korea (KOR), economic relations are almost null, reversing the economic interdependence which PRK has with China. PRK would not have advantage to bombs China, a strategic partner in its trade, unlike South Korea. There are no diplomatic relations between the two countries, in fact, despite proximity, they are separated by a bloody history, which has not yet been fully resolved (must be remember the 1953 - 2013 Panmunjeom armistice, nullified by PRK in 2013). If North Korea attacks Seoul, there will be disastrous repercussions. Attacking Seoul means attacking one of the most technologically advanced countries in the world, whose capital resides headquarter of Samsung Electronics, Hyundai Motor, LG Electronics, just to name a few. Hitting Seoul would mean hitting 10 million people, civilians. Seoul is the heart of the 5th largest country in the world for exports and 9th for imports (Source: OEC). Undoubtedly, it would be a breeding ground for both world and US economy, which has intense trade with KOR.

11

Fig. 3: What does South Korea export to the U.S.? (2015) Source: OEC, The Observatory of Economic Complexity

The OEC website provides data about the worldwide economy. Fig. 3 shows the percentage of manufactures made in South Korea and exported to the United States. The highest percentage is gained by the sub-car industry, with 24% of cars and another 6.8% of its components reaching over 30% of total exports in the transport sector. For this reason, in my research I decided to consider two large automotive companies based in Seoul, POSCO Daewoo Corporation and Kia motors Corporation to observe their financial behaviour. I will explain my research in the next paragraphs. As reported by ‘The Diplomat’ journal, South Korea has already suffered economic damage over the Korean tensions, due to China’s economic response to its move to install the U.S. Terminal High Altitude Area Defense (THAAD) missile defense system. With China representing South Korea’s largest trading partner, Beijing has flexed its economic muscles through measures such as restricting Chinese tourism to South Korea, closing Korean-owned Lotte stores and blocking imports of Korean cosmetics and TV shows (Fensom, 2017).

12

3.  Methods, data & analytical approach In the preceding paragraphs I summarized the political and economical history about North Korea, South Korea and United States, main actors in this conflict. My work proceeds to observe the stock price trend of some companies – chosen by a reasoned sample –, that might be interested in the affairs separating these countries, related to daily North Korea’s news. Due to academic reasons, the time period I analysed is between 1st May 2017 and 31st May.

3.1 Datasets Yahoo Finance

The South Korean trading situation led me to analyse POSCO Daewoo Corporation and besides Kia motors Corporation for the transport sector, and a third in the technology hardware and equipment industry, Samsung Electronics Co. with reference to the Korea Stock Exchange (KSE). Furthermore, I analysed Kospi (“KS11” – a south korean stock index), Samsung, LG Chem and LG electronics. From the US front I analysed two of the biggest cybersecurity companies, Proofpoint and SAIC, both included in the NYSE index. I included the Nasdaq index, and other four big aviation and defense US firms: Raytheon, Lockheed Martin, Boeing and Northrop Grumman Corporation. The historical stock series of these companies and indexes has been free downloaded from Yahoo Finance in .CSV format. Each file contains the daily information about the price of an asset, the lowest price, the highest and the daily closing price. These south korean companies could be injured by an attack by North Korea. On the other hand, there are the companies that could benefit from it.

Gdelt project

Gdelt is a project which monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages on a daily frequency. This dataset contains

13

57 fields for each record2, every row represents an event with a relative source URL of the news, columns (variables) include the date of the event, the actors involved in the fact and an array of coded attributes indicating geographic, ethnic, and religious affiliation and the actor’s role in the environment (political elite, military officer, rebel, etc). In this case the Gdelt dataset has been really useful, obtaining empirical data to conduct this research. During this work, the Goldstein Scale (one of the many indexes in Gdelt dataset) has assumed a central rule. This index is assigned to a numeric score from -10 to +10, and it aims to capturing the theoretical potential impact that a type of event will have on the stability of a country. “-10.0” will have a seriously theoretical negative impact, viceversa “+10.0” a theoretical positive impact to countries’ stability. This field specifies the Goldstein score for each event type.

3.2 Data analysis

Extracting data with Knime

In the first data analysis’ steps I used Knime, an open source data analytics, reporting and integration platform. Once downloaded raw data in .CSV format from 1st to 31st May from the Gdelt website (http://data.gdeltproject.org/events/index.html), I aggregate them in an single spreadsheet using the following (fig. 4) Knime’s operator chain, obtaining all events happened worldwide between these dates (more than 5 millions of rows). Gdelt website provides a codebook, a quick overview of the fields in the GDELT data file format and their descriptions, so I renamed the columns (variables) in the final output3. The outcome is a 57 variables tables containing all information recorded by Gdelt. In order to analyse the relationship between what North Korea (Actor Code: PRK) does to South Korea (Actor code KOR), I filtered these countries’ events using row filter operator (fig.5). Finally, I used a CSV writer operator to export all my work on a CSV file that I opened and edited using Microsoft Excel.

2

For

more

information

about

this:

http://data.gdeltproject.org/documentation/GDELT-

Data_Format_Codebook.pdf 3

Important: Default Knime Settings do not allow single operators output larger than a certain number space on

computer memory. I edited the Knime.ini file following these instructions: https://tech.knime.org/faq#q4_2

14

Fig. 4 Knime operators chain to extracting data. There are original csv files (orange ones on the left), concatenate operators in order to aggregate them in only one file (yellow ones) and then 3 “bigger blocks” dedicated to each division.

Fig. 5 Example of Knime Row filters.

15

Aggregating data with Microsoft Excel

Once the number of rows has been reduced using Knime, I opened my CSV file using MS Excel. This step allowed to create a summary table within all data extracted and a pivot table, a summary table of specific values4. I chose to analyse the impact of Goldstein Scale index in order to compare its daily trend with the daily firms stock price.

Fig. 6: Summary table with all closing price firms and sum + average of Goldstein Scale.

Hidden Values: between rows 4-30 there are hidden values, messy values, which are dates probably added from Gdelt system out of initial time analysis.

Fig. 7: Example of pivot Goldstein table about news from North to South Korea

4

Some values inside cells, like the date format and the Goldstein scale weren’t structured. I formatted these two

values using relative formulas.

16

Pivot table shows single dates with the relative sum and average of Goldstein Scale. Dates inside the red ring are anomalous readings, being out of interests in this research since the time period considered is from 1st May to 31st May 2017. This hasn’t been a problem, rather, this is a proof of how, in the big data framework, the increase in data volume is related to inexactitude. In fact, in the primary selection I downloaded only .CSV files from the date target range but the system shows more dates than what I downladed. This table’s result will be helpful in the data visualization step with Tableau software. Until now, the process only concerned the case where the actor1 (who carried out the action) was North Korea and the actor2 (who is interested in the action) was South Korea in order to monitor how companies with headquarters in Seoul could fear an attack by the North Korean using the Goldstein Scale for every news. The process I’ve just explained must be replicate twice: -  

once including as actor 2 the United States;

-  

once including as actor 2 South Korea + United states;

in order to compare news including U.S news and financial stock price of some U.S. stockexchange indexes.

The formatting data problem.

Data often are not formatted in an appropriate way to be analysed. Conducting the analysis, the original date format and Goldstein index of Gdelt were not suitable. As reported in fig. 8 the original day format couldn’t be analysed in excel, so I had to apply this formula, obtaining a right date format.

Example fx: DATE(LEFT(B3;4);LEFT(RIGHT(B3;4);2);RIGHT(B3;2))

Same for the Goldstein scale:

Example fx: VALUE(SUBSTITUTE(H3;".";","))

Fig. 8: Correct analysis format.

17

4.  Results and conclusion 4.1 Gdelt Results

The number of events recorded by Gdelt in May 2017 considering as actor1 North Korea (PRK) and actor2 South Korea (KOR) has been of 2318, almost 75 events each day. This number doesn’t consider the times a single event has been treated by journals, just the worldwide amount of events considered only once. If we consider this, articles are 42743. From the analysis emerged 3212 events considering only the U.S. as actor2 (50843 articles) and 93586 articles considering U.S. + South Korea. Every event has been classified by Gdelt in four categories coded as integer under. This variable takes the name of Quadclass. The entire CAMEO event taxonomy is ultimately organized under four primary classifications:

1 = Verbal Cooperation 2 = Material Cooperation 3 = Verbal Conflict 4 = Material Conflict

PRK to KOR - QUADCLASS CLASSIFICATION Quadclass classification Date 1 2 3 4 Total 01/05/17 62 8 33 33 136 02/05/17 46 10 28 21 105 03/05/17 39 7 13 18 77 04/05/17 29 4 18 15 66 05/05/17 24 9 59 64 156 06/05/17 18 11 34 64 127 07/05/17 26 7 20 27 80 08/05/17 39 8 27 29 103 09/05/17 81 3 32 22 138 10/05/17 91 5 21 11 128 11/05/17 67 3 23 11 104 12/05/17 54 17 36 30 137 13/05/17 48 8 3 23 82 15/05/17 94 8 8 28 138 16/05/17 59 2 13 20 94 17/05/17 51 3 11 22 87 18/05/17 28 3 8 7 46 19/05/17 24 5 16 6 51 22/05/17 35 2 5 38 80 23/05/17 14 14 36 64 24/05/17 18 1 8 9 36 25/05/17 3 1 13 14 31 26/05/17 14 1 1 23 39 29/05/17 41 2 15 55 113 30/05/17 31 5 15 17 68 31/05/17 19 4 3 6 32 Total 1055 137 477 649 2318 Tab 1. PRK to KOR Quadclass index

18

The higher amount is 1055 events related to verbal cooperation. In second place there is the material conflict with 649 events. This could be explained by the numerousness of ballistic tests from North to South Korea’s coasts which could be considered as material conflict by Gdelt system.

PRK to KOR + US - QUADCLASS CLASSIFICATION

PRK to US – QUADCLASS CLASSIFICATION

QUADCLASS CLASSIFICATION Grand 1 2 3 4 Total Total

2687

518

3921

5104

12230

Tab 2. PRK to KOR + US Quadclass index

QUADCLASSCLASSIFICATION Grand

Total

1 1632

2 3 244 2490

4 Total 2508 6874

Tab 3. PRK to US Quadclass index

In the other two cases, including U.S. as second actor (actor2), the most significant data (confirming the difficult relationship between these countries) is the highest value on the 4th Quadclass’ category. As introduced in the previous paragraph, Goldstein is another index useful for this research. Summarising, Goldstein is a number between -10 and +10 representing the theoretical potential impact that the event will have on the stability of a country. Here is a representation concerning Goldstein scale considering three way: Actor1:PKR, Actor2: KOR; Actor1: PKR, Actor2: KOR+US Actor1: PKR; Actor2: US

19

Fig. 10: Global daily Goldstein trend.

The reader could notice in the y left axis a different scale from the definition of Goldstein scale given in the previous paragraph. During analysis, I summed up the daily Goldstein score to emphasize how many times the event has been treated by news media. The only daily Goldstein average’s value, would not express other information like the weight about an event: using the daily average Gs score. One Gs daily avarage with -10 would have the same weight of one hundred daily news. The US' Goldstein's trend is similar to the KOR one. Concluding, news are generally associated with a negative Goldstein Scale score, indicating a generally theoretical negative impact of events to U.S. and South Korea (better describing and confirming the diplomatic situation between states).

4.2 Gdelt and Yahoo results At the same time as Gdelt's data collection, I collected data from some U.S. and South Korean stocks to see if there is a correlation between the Goldstein index and the value of equities.

20

Pearson Correlation Coefficient (r) This study includes most quantitative variables. The indipendent variable in this study is the Goldstein score. Originally this index is a pure number, appearing as a value between -10 and +10, but as explained, I will consider Goldstein's daily sum in order to have a sort of weighting index. While Goldstein's value depends by the kind of news (assets vary according to political events reported by news), the dependent variable is the closing price of each firms. In order to find the value of a correlation between these two variables, I used the Pearson correlation index within this (example) formula: fx=PEARSON($O$3:$O$29;R3:R29)

4.3 United State’s ratings

I selected 7 US shares: - Nasdaq - a well-known index containing the most capitalized american technology companies; two important cybersecurity companies - SAIC and Proofpoint – and finally, according to what is happening on the North Korean military, four companies operating in the defense field, artillery and aviation - Raytheon, Lockheed Martin, Boeing, Northrop Group -.

Fig. 11. Cybersecurity Firms – SAIC and Proofpoint

21

The two companies’ trend is similar, they are positively correlated with 0.755. However, as we will see for subsequent data, there will be no linear correlation between the two phenomena: indeed Proofpoint’s correlation with the Goldstein scale is +0,12 and SAIC’s one is -0,15. This means that there is almost no correlation between Gdelt’s index and assets.

Fig. 12. U.S. Aviaition / Artillery firms

The same case happened for Boeing, Lockeed Martin, Northrop Group and Raytheon despite being well correlated with each other (0.92 between Raytheon and Lockheed Martin and 0.68 between Boeing and Northtrop Group). Finally, almost total incorrectness between Nasdaq and Goldstein.

5

Example: fx=CORREL(R3:R29;S3:S29)

22

Fig. 13: Nasdaq and Goldstein score

4.4 South Korean ratings I compared five South Korean companies and the Kospi index with the Goldstein trend about articles which included U.S. as actor2. For significance’s reasons, I report just graphs and results inherent to this combination, excluding the U.S.+PRK actor2. Companies are: LG Electronics, LG Chem, Samsung, Posco Daewoo, Kia Motors Corporation.

Fig. 14: Kia, LG and Posco Daewoo

23

Fig. 15: Samsung and LG Chem

Fig 16: Kospi index

As the reader can note, there are missing values from Yahoo Finance’s dataset about daily closing prices. However, as in the US case, the Pearson coefficient never assumes values close to -1 or +1, assuming values close to zero. This indicates a general absence of linear correlation.

24

The results obtained are uniquely attributable to these data sources. Contrary to what I could expected, I did not observe any correlation between study cases. Indeed, in the case of South Korea, it would be reasonable observing a similar trend between two variables considered for this study, Goldstein score and asset prices. As a result of a Goldstein’s fallen there should have been a consequent lowering in the price. As graphs shows, this did not happen even in the case of the four american companies: this time it would have been rational to think of an opposite response to Goldstein's performance.

This study may have problems. One first point concerns the small amount of data available for the south korean indexes and the lack of opportunity to compare Nasdaq to the equivalent korean index, the Kosdaq, unfortunately not available in .CSV format from Yahoo Finance in the time frame of this research. A second point, in the analysis phase, lies in the weakness of a single index, the Pearson’s one, useful for observing a correlation between the two quantitative variables. In addition, in order to obtain more indicative data, the research could be improved to a wider period, including months prior May 2017, exploring which companies might be more involved in this fragile political array.

25

5.  Sitography Campos, Rodrigo, “Global stocks rise with oil, cyber attack; weak data knocks dollar”, http://www.reuters.com/article/us-global-markets-idUSKCN18B025, 15 May 2017, (last consultation 17 June 2017); Fenson, Anthony, “North Korea crisis hits Asia’s biggest economy”, http://thediplomat.com/2017/04/north-korea-crisis-hits-asias-biggest-economies/, 17 April 2017 (last consultation 17 June 2017) Gdelt data events: http://data.gdeltproject.org/events/index.html (last consultation 16th June 2017); Hanley, Jon, “How France’s presidential election could break – or make – the EU”, 20th April 2017, https://www.theguardian.com/world/2017/apr/20/how-france-presidential-election-could-breakor-make-the-eu, (last consultation 16th June 2017); Hern, Alex, Gibbs, Samuel, “What is WannaCry ransomware and why is it attacking global computers?” 12 May 2017, https://www.theguardian.com/technology/2017/may/12/nhsransomware-cyber-attack-what-is-wanacrypt0r-20 (last consultation 17 June 2017) Moussanet,

Marco,

“I

mercati

sentono

l’effetto

Macron”,

5th

may

2017,

http://www.ilsole24ore.com/art/mondo/2017-05-04/i-mercati-sentono-l-effetto-macron200420_PRV.shtml?uuid=AEIciMGB, (last consultation 16th June 2017); Mullen, Jethro, “How North Korea makes its money: Coal, forced labor and hacking”, http://money.cnn.com/2017/04/05/news/economy/north-korea-economy-china-trumpxi/index.html, 5 April 2017, (last consultation 17 June 2017) Wikipedia, https://en.wikipedia.org/wiki/Lazarus_Group , (last consultation 16th June 2017). Smith, Alexander, Gabe, Joselow “Defense Sec. James Mattis: North Korea ‘Has got to be stopped’”, http://www.nbcnews.com/news/world/defense-sec-james-mattis-north-korea-has-got-bestopped-n740966, 31 March, 2017, (last consultation 17 June 2017)

26

6.  Table of indexes Reference

Company Name

005930.KS.csv

Samsung

SAIC.csv

SAIC

PFPT.csv

Proofpoint

^NDX.csv

Nasdaq

GD.csv

General Dynamics

LMT.csv

Lockheed Martin

NOC.csv

Northrop Grumann Corporation

RTN

Raytheon Company

^KS11.csv

Kospi

047050.KS.csv

Posco Daewoo Corporation

051910.KS.csv

LG Chem, Ltd.

003550.KS.csv

LG Corp.

BA.csv

The Boeing Company

27

28