Extracting Product Features for Opinion Mining Using ... - ScienceDirect

12 downloads 103 Views 428KB Size Report
Sep 6, 2017 - Procedia Computer Science 112 (2017) 927–935. 1877-0509 © 2017 The ... 1877-0509 c 2017 The Authors. Published by Elsevier B.V..
Available online at www.sciencedirect.com

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect KES Science 00 (2017) 000–000 Procedia Computer 112 (2017) 927–935 KES 00 (2017) 000–000

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

International International Conference Conference on on Knowledge Knowledge Based Based and and Intelligent Intelligent Information Information and and Engineering Engineering Systems, KES2017, 6-8 September 2017, Marseille, France Systems, KES2017, 6-8 September 2017, Marseille, France

Extracting Extracting Product Product Features Features for for Opinion Opinion Mining Mining Using Public Conversations in Twitter Using Public Conversations in Twitter a* a b, Rania Othman a*, Rami Belkarouia, Rim Faizb, Rania Othman , Rami Belkaroui , Rim Faiz a

Tunis, University of Tunis, Bardo, Tunisia a LARODEC, ISG ISG Tunis, University of Tunis, Bardo, Tunisia b LARODEC, LARODEC, Carthage, University of Carthage, Carthage Presidency, b LARODEC, IHEC IHEC Carthage, University of Carthage, Carthage Presidency,

Tunisia Tunisia

Abstract Abstract The conversational element of Twitter has recently become of particular interest to the marketing community. However, most The conversational element of Twitter has recently become of particular interest to the marketing community. However, most studies on mining product features through Twitter, have so far employed simple individual tweets rather than considering the studies on mining product features through Twitter, have so far employed simple individual tweets rather than considering the whole conversations. In this paper, we empirically evaluate whether employing user interactions in public conversations can whole conversations. In this paper, we empirically evaluate whether employing user interactions in public conversations can improve the product feature extraction from tweets. We propose a conversation-based method which considers a conversation improve the product feature extraction from tweets. We propose a conversation-based method which considers a conversation as a reply tree and employs reply links, to effectively extract the product features involved in the messages. We also develop as a reply tree and employs reply links, to effectively extract the product features involved in the messages. We also develop a conversation filtering process which combines scores measured from different aspects including content relevance and social a conversation filtering process which combines scores measured from different aspects including content relevance and social aspects. We conducted our experiments using a manually annotated Twitter corpus involving smartphones and other electronics aspects. We conducted our experiments using a manually annotated Twitter corpus involving smartphones and other electronics products. The experimental results show the effectiveness of our proposed method. products. The experimental results show the effectiveness of our proposed method. c 2017 The Authors. Published by Elsevier B.V.  c 2017  2017 The TheAuthors. Authors.Published Publishedby byElsevier ElsevierB.V. B.V. © Peer-review under responsibility of KES International. Peer-review under responsibility of KES International. International Keywords: Twitter, Conversations, feature extraction, anaphora, user opinions; Keywords: Twitter, Conversations, feature extraction, anaphora, user opinions;

1. Introduction 1. Introduction In the recent several years, Twitter has witnessed a tremendous growth. The conversational element is key comIn the recent several years, Twitter has witnessed a tremendous growth. The conversational element is key component in such service while enabling people to interact, engage in daily chatter, join conversations, report news and ponent in such service while enabling people to interact, engage in daily chatter, join conversations,1 report news and share information. Recent reports show that a huge percentage of Twitter posts are conversational 1 . The subjective share information. Recent reports show that a huge percentage of Twitter posts are conversational . The subjective character of the public conversations make them a precious source for business owners to keep on track and have character of the public conversations make them a precious source for business owners to keep on track and have general overview on their customer opinions about the brand. general overview on their customer opinions about the brand. However, as the number of conversations that a product receives grows rapidly, managing this massive amounts However, as the number of conversations that a product receives grows rapidly, managing this massive amounts involving a lot of noise and redundancy, remains pretty challenging. Thus, applying automated systems which could involving a lot of noise and redundancy, remains pretty challenging. Thus, applying automated systems which could retrieve individual opinion words or phrases and what they are about, referred to as the product features or opinion retrieve individual opinion words or phrases and what they are about, referred to as the product features or opinion topics, could be quite useful. topics, could be quite useful. To the best of our knowledge, most previous works on opinion mining through Twitter, have so far tackled simple To the best of our knowledge, most previous works on opinion mining through Twitter, have so far tackled simple . As tweets are relatively short involving a lot of individual tweets instead of considering the full conversations 2,3,4,5 individual tweets instead of considering the full conversations 2,3,4,5 . As tweets are relatively short involving a lot of ∗ ∗

Corresponding author. Tel.: +216 25615265 Corresponding Tel.: +216 25615265 E-mail address:author. [email protected] E-mail address: [email protected]

c 2017 The Authors. Published by Elsevier B.V. 1877-0509  c 2017 The Authors. Published by Elsevier B.V. 1877-0509  Peer-review under responsibility of KES International. 1877-0509 © 2017responsibility The Authors. Published by Elsevier B.V. Peer-review under of KES International. Peer-review under responsibility of KES International 10.1016/j.procs.2017.08.122

928

Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

abbreviations, typos, slangs and other informal components, it is generally hard to understand a given tweet without considering the hole context. Also, a large number of product features are referenced by anaphoric pronouns in major succeeding replies of a conversation message. Anaphora can be defined as the use of an expression the interpretation of which depends specifically upon a previous expression 6,7 . Consider this example sentence: “It is good.". If one wants to extract the product feature that the author has commented on, previous replies must be analysed. Frequently, while tackling conversations, a large number of messages are left unnoticed due to their vague and unclear aspect. In this paper, we introduce a new approach for product feature extraction. We propose a conversation-based method which employs reply links involved in an anaphora resolution (AR) process, to effectively extract the product features involved in the tweets. We also develop a filtering process which filter the conversation collection using a combination of content relevance and social aspects. To the best of our knowledge, this is the first research employing twitter conversations in the task of product feature extraction. In the following section, we overview the related work on product feature extraction. We then describe in Section 3 our new method for product feature extraction through Twitter conversations. In Section 4, we introduce the experimental protocol and discuss our obtained results, and finish with an outlook to the future in Section 5.

2. Related Work Product feature extraction, called also, opinion target identification, is crucial for opinion mining (OM) and summarization especially given that this task provides the foundation for opinion summarization 8 . The opinion target can be defined as the entity ( i.e., person, object, feature, event or topic) about which the user expresses his opinion. Extensive approaches and techniques have been addressed to mine opinion components or targets from unstructured reviews. These works can be very broadly divided into two main categories supervised and unsupervised. Other works have also employed the semi-supervised approach. In the supervised learning approaches, a machine-learning model is trained on manually labeled data to extract and classify the feature set in the reviews. Although these techniques provide good results for opinion target extraction, they require extensive manual work for the training set preparation, they are also time consuming, and sometimes domain dependent. The most common techniques employed in supervised approaches are decision tree, support vector machine (SVM), K-nearest neighbor (KNN), Naïve Bayesian classifier and neural network 9,10,11 . On the other hand, unsupervised approaches automatically extract product features using syntactic and contextual patterns without the need of labeled data 12,13,14 . A challenge which is frequently encountered in the opinion target extraction task is, that entities can be sometimes implicit and therefore hard to find. In the case of explicit target identification, we generally employ noun phrases with syntactic rules 15,12,16 . While for implicit targets, context dependency or distribution similarity are employed. To the best of our knowledge, there is currently only two systems that integrate coreference information and apply AR in OM. Stoyanov and Cardie 17 develop an algorithm that identifies coreferring targets in newspaper articles. They rely on manually annotated targets thus, a candidate selection phase for the opinion targets is not required. The authors focus only on the coreference resolution but they do not resolve pronominal anaphora in order to achieve this purpose. For their part, 9 adapt the rule based AR algorithm CogNIAC to extract opinion targets on a movie review corpus. They have shown that extending an OM algorithm with AR for opinion target extraction can achieve significant improvements. In this paper, we apply AR to extract opinion targets ( i.e., product features) from Twitter conversations. To the best of our knowledge, this is the first research employing twitter conversations and AR for the feature extraction task. 3. Proposed Approach In our work, we aim to provide an automatic customer opinion summarization based on Twitter conversations. Our proposed approach is based on three main modules namely, Pre-processing, Conversation Retrieval and product feature extraction. Figure 1 presents an overview of our approach. Given a tweet corpus on electronic products, we clean our collection via the pre-processing phase than we proceed the conversation retrieval module which aims to construct conversations from the collection of separate tweets. In the phase of conversation filtering, we employ text content relevance as well as social information metrics to filter out



Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

929

Fig. 1. Overview of our proposed approach

non relevant conversations. The final step consists on extracting product features commented by users through the conversation collection. The three modules are illustrated in details in the following subsections. 3.1. Pre-processing The collection of obtained conversations are processed before the summarization process. First of all, the URL in each message is analyzed. As the text size is limited to 140 characters, it is very common for twitter users to employ URL shortening services. While different shortened URLs might redirect to the same end URL, it is necessary to replace them with the real URLs they redirect. Thereafter, we employ an API service 1 for HTML text extraction which removes comments, links, ads, and other unrelated parts of a web page and returns key contents in plain text. Then, we proceed to the second step of the processing involving the text analysis. In this step, both the text content of the tweet and the text retrieved from the URLs are taken into account. We cleaned the text by removing ASCII characters, numbers, punctuation, and stop words. In the end, we convert the text to lower case and we tokenize it. The remaining tweet features such as the ID of the author and other social information are also extracted from the API and stored. We indexed the collection of tweets with Apache Lucene 2 which is full-featured text search engine library written entirely in Java. This open source information retrieval software library works with fields of text within document files and can index practically any type of text-containing document. 3.2. Conversation Retrieval Module We perform this module in two main steps: conversation construction and the pre-processing. Below, we discuss each of the steps in turn. 3.2.1. Conversation Construction In this section, we describe the process of constructing conversations from a collection of tweets. We consider the conversation definition presented by 18 . They defined a conversation as a set of short text messages posted by a set of users at specific timestamps on the same topic. This messages can be directly replied to other users by using “@username" or indirectly by liking, retweeting, commenting and other possible interactions. We apply the same method developed by 18 who proposed a user-based tree model for retrieving conversations from microblogs. They do not only retrieve direct messages based on reply links but also consider indirect messages that can be related to the conversation via other links such as retweet and mention interactions. The method was conducted in two steps. The first step is the replies conversation phase that aims to collect all tweets in reply to other tweets. A reply to a user begins always with “@username", thus a search call returns all the replies to this user. To check whether the given tweets are related to the conversation, we employ the “in reply to status id" field accessed through statuses/lookup.json files via Twitter API. 1 2

http://www.alchemyapi.com/ https://lucene.apache.org/

930

Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

In this phase, we extract the root of the conversation as well as the set of all tweets linked to the root which allow us to construct a reply tree for each conversation. In the second step, we focus on indirect messages that can be related to the conversation without the use of “@symbol". We employ a set of features to select tweets that are likely to be related to a conversation. We begin by grouping tweets with the same URL as the reply tree of conversation. Then, the hashtag feature is applied to collect tweets that share the same hashtag present in the conversation tree. Thereafter, we remove tweets that are relatively distant in terms of posting date compared to the original conversation. Finally, we calculate the similarity between candidates’ tweets and tweets present in the reply tree to find those that are indirectly related to the conversation. For each candidate’s tweet, we extract the tweets from the reply tree that are similar in content to the candidate, and then we classify the results according to their relevance. We calculate a cosine similarity score for each pair of tweets. The cosine similarity measures the angle between two vectors. In our case, these two vectors represent the contents of a pair of messages converted into vectors with numerical elements through tf-idf (term frequency-inverse document frequency). 3.2.2. Conversation Filtering Given a collection of n conversations extracted from the previous phase, we aim to output a subset of no more than M conversations that should be the most representative candidates that best preserve important information in the original set. Let M be a systematic parameter whose value is set according to Formula , where α is set to 0.05 and n is the total number of conversations. To select an appropriate value for the parameter α, we tested our filtering system with different values of α 0.01, 0.05, 0.08 and 0.1 for n= 1000. We find that above 0.05, the results tend to be biased. In our filtering process, we employ several metrics that fall into two main categories: Content Relevance and Tweet influence. a. Content Relevance: The Text content quality is an important aspect for the selection of salient tweets. We used two widely employed content relevance features which measure the readability and average length of tweets. • Readability It can be defined as the ease in which text can be read and understood. One widely adopted readability is measured according to the Reading Ease Flesch Formula 19 . Flesch Formula is considered as one of the oldest and most accurate readability formulas. The computations involve only the counting of words, syllables,and sentences. From these counts sentence length and word length are combined to compute the actual scale score. This score can range from zero, for extremely difficult reading, to one hundred, for very easy reading. We employ The Fresh Formula adapted by 20 , where: ASL is the average sentence length (number of words divided by number of sentences), ASW is the average word length in syllables (number of syllables divided by number of words), AOW is the average number of OOV words, i.e., the number of OOV words divided by the total number of words; and AAS is the average number of abnormal symbols, i.e., the number of abnormal symbols divided by the total number of words. We compile a dictionary of 1 million words, and a list of 125 symbols to identify OOV words and abnormal symbols, respectively. The coefficients (i.e., 10.5 and 9.8) are determined using linear regression. R(t) = 206.835 − (1.015 × AS L) − (84.6 × AS W) − (10.5 × AOW) − (9.8 × AAS )

(1)

The Readability score of all the tweets belonging to a conversation c is calculated as follows:  1 R(c) = R(t) maxc C(| c .tweets |) tc.tweets

(2)

The Length score of all the tweets related to a conversation c is calculated using following formula:  1 L(c) = L(t) maxc C(| c .tweets |) tc.tweets

(3)

• Average length: This measure computes the average length of a tweet 20 . We assume that long tweets are more susceptible to deliver relevant and rich information. We define the tweet average length as follows: L(t) =| cr | /140 where |cr| is the number of characters of the message and 140 refers to the number of characters that a tweet can not exceed.



Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

931

Thereafter, the global Content Relevance is measured using the following formula: ContentRelevance(c) = αReadability .R(c) + αlength .L(c) Where αReadability +αLength = 1 indicate the relative importance of the different factors and each parameter [0, 1]. In the experiments, the two parameters are set to 0.5. b. Tweet Influence: Twitter uses specic syntax (e.g., retweets, favorites, reply). This can give insight about the importance of the message. If a user marks tweets as favorites, he can easily find useful information. He can also spark the interest of other online users to start a conversation or comment on the tweet. On the other hand, a post which has been retweeted or replies so many time is more likely to be informative. To measure the imporatnce of the tweet we employ the same method developed by 21 that compute tweet’s influence based on the following functions: • Retweet Score(t) computes the nuber of time a tweet has been retweeted. RetweetS core(t) = βRetweetS core × nbrretweet(t)

(4)

• FavoriteScore(t) indicates the number of times a tweet has been favorited. FavoriteS core(t) = βFavoritetS core × nbr f avorite(t)

(5)

• ReplyScore(t) gives the number of times a tweet has been replied by other twitterers. ReplyS core(t) = βReplyS core × nbrreply(t)

(6)

Where βRetweetS core , βFavoritetS core , βReplyS core  [0,1]. Finally, the tweet influence score is defined as follows: T weetIn f luence(c) = β1 RetweetS core(c) + β2 FavoriteS core(c) + β3 ReplyS core(c)

(7)

Finally, Conversation score is obtained by combining both Social relevance and content relevance scores in linear combination. ConversationS core(c) = θ1ContentRelevance(c) + θ2 T weetIn f luence(c)

(8)

Where β1 , β2 , β3 , θ1 , θ2 ,  [0,1]. To avoid bias related to features existing with a very small number, we cleaned our collection of conversations by filtering out the conversations that involve less than 3 participants and containing less than 7 tweets. 3.3. Product Feature extraction Module In this stage, we aim to extract the opinion targets or features customers expressed their opinions on. For example, if we want to generate an OM about iPod, some of the common features are “battery life", “sound quality" and “ease of use". Given a collection of filtered conversations, our system splits the reviews into sentences, then, it converts them to lower case and remove the non literal characters at the beginning and the end of each word (e.g. “#IPhone#" becomes “iphone"). Steinberger et al. 7 reveal that noun and noun phrases in the sentence are likely to be the features customers expressed their opinions on. We therefore perform the Part-of-Speech (POS) tagging of the whole document to identify the grammatical class of each word using TWEEBOPARSER 3 . In part-of-speech tagging by computer, it is typical to distinguish from 50 to 150 separate parts of speech for English. For ease of use, we just group the different tags into four categories: noun (“NN", “NNS"), verb (“VV", “VVD", “VVG", “VVN", “VVP", “VVZ",“VBZ"), adjective 3

https://github.com/ikekonglp/TweeboParser

932

Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

(“JJ", “JJR", “JJS") and adverb (“RB") 4 . We extract nouns from the reviews and we move on to the feature decider step. We construct our stop word list and we filter out the extracted nouns existing in the stop word list. Then, we construct noun phrases which are composed of two successive nouns (e.g. Click wheel, Battery Life). We extract all noun phrases but we only keep those appearing together at least 3 times in the reviews. We remove sentence redundancy, if a noun appears more than one time in the same sentence, we consider as if it appears one time. We then compute frequency of occurrences in the reviews for the whole set of extracted nouns and we only keep those whose frequency is greater than 0.02. All the previous steps leading to the construction of feature list are modeled in Figure 2.

Fig. 2. Main steps of the feature list construction

To decide if the noun phrases we collected are meaningful, we apply a similarity measure called point-wise mutual information (PMI-IR) 22 that uses page counts returned by a web search engine to recognize synonyms. Like 14 , we detect the compactness of a noun phrase using the number of tweets concerning a given product instead of the search engine page counts. Given two words w1 and w1 , PMI is defined as follows: PMI(w1 , w2 ) = log

tweets (w1 , w2 ) tweets(w1 )tweets(w2 )

(9)

Where tweets(w1 ,w2 ) is the number of tweets involving both w1 and w2 as a compacted word (e.g., Battery life) while tweets(w1 ), tweets(w2 )) represent the the number of tweets containing only w1 , w2 , respectively. We prune noun phrases having PMI < 0. Having our feature list, we proceed to the next step which aims to affect each review (or post/message) in our corpus to the feature the author comment on in his post. To do so, we apply our proposed Target Identification Algorithm based on conversation interactions notably the reply-links. Indeed, a large number of features appearing as noun phrases and even simple nouns in reviews are generally referenced by anaphoric pronouns in major succeeding sentences of a review document 23,9 . The anaphoric term is called an anaphor. For example, in the sentence: “I’m so glad with my new iPhone 5, it’s just amazing", the pronoun “it" is an anaphor, referring back to the antecedent “my new iPhone 5". Actually, the overwhelming majority of the opinion targets ( i.e., product features) are pronouns in the datasets 23 . Thus, AR is crucial for binding feature-review pairs, otherwise a very big number of opinion reviews will be left unnoticed due to their ambiguous aspect. In order to identify the associations of such reviews with correct features, a conversation-interactions based algorithm has been developed. To the best of our knowledge, this is the first such algorithm that employs conversation interactions notably reply-links for effective binding of feature-review pairs. Our algorithm proceeds as follows, for a tweet ti,l , that does not contain any feature words existing in the feature list but involves opinionated words ( i.e., adjectives and adverbs) along with some pronouns, all anaphora pronouns present in this sentence that require mapping are extracted, and a set of anaphora P={p1 , p2 , p3 , ..., pn } is compiled for proper context determination. We employ a backtracking mechanism in which review documents are accessed in reverse order based on reply-links to extract precedent sentences ti−1, j that ti,l replies on. For each anaphora pronoun pi  P, proper context is determined to compile a set A={ak,1 , ak,2 , ak,3 , ..., aq,m } consisting of candidate antecedents. The best antecedent ak,t  A is selected for binding with ao using CogNIAC algorithm ? , a publically available algorithm for AR that employs a rule based approach for antecedent identification. This approach can be an adequate strategy for 4

For more details see https://courses.washington.edu/ hypertxt/csar-v02/penntable.html



Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

933

our OM task, since in our corpus, a small pourcentage of the total number of pronouns are actual product features (only 6%). We denote the lth tweet at iteration i ti,l while the jth tweet at iteration i is denoted ti, j . As each anaphora pronoun is replaced by the selected antecedent, backtracking process terminates with the iteration i-1 for each sentence. At the end of this phase, we obtain a set of opinionated sentences and each sentence is associated to the corresponding target feature that the user commented on. 4. Experiments and results 4.1. Dataset Description Due to the lack of test collections for Twitter conversations, we have created our own collection. We crawled 221 663 English tweets using Twitter Application Programmable Interface (API) 5 . The Twitter API allows developers around the world to have free and open access to Twitter’s database. The tweet collection was crawled over a period of 4 months from April 25th 2015 to July 25th 2015. Table 1. Example of a tweet along with its features id user_id created_at in_reply_to_status_id Text

297132154159243264 129451756 Fri 01 00:00:00 2015 297132154159239168 Anyone else have the same issue?

We only search popular tweets talking about a given product involving character description, promotion information and comments about new products. After removing the repeated ones, 211 350 tweets remained. From our collection of tweets, we have constructed 8 720 conversations involving 64 370 tweets and 13 827 bloggers. We employ statuses/lookup.json files accessed through Twitter API, which contain all information related to the tweets. Table 1 shows a tweet extracted from our corpus along with an excerpt of its features given by statuses/lookup.json files. Table 2 outlines some statistics on our Conversation collection. As shown, there exist over 120K pronouns and roughly 11.13 % of the product features are referred to by pronouns. Table 2. Corpus size statistics Tweets Tokens Target + Opinion Pairs Targets which are Pronouns Pronouns

64 370 2 568 160 7 960 886 > 120 35

4.2. Evaluation Results Our approach was implemented in Java using the Twitter conversations of five electronics products namely, 2 digital cameras and 3 smartphones. Regarding the fact that our evaluation requires a huge human endeavor to pinpoint product features and subjective reviews, we have reduced the set of 8K conversations to only 4K. For each product, the first 800 conversations were extracted and preprocessed. Then, we run our system in order to perform feature extraction. For evaluation, all the extracted conversations were read manually. For each single tweet, if it includes user opinions, all the features on which the user has given his opinion were tagged. For each product, we manually produced a list of related features. The number of manual features for each product is shown in Column “Nbr of 5

https://dev.twitter.com/overview/api

Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

934

features" in Table 3. The features delivered by our system are compared with the manually tagged results. In this subtask, precision and recall are employed as they are the most popular measures used in the feature extraction task 24 . In our work, TP, TP+TN TP+FP denote respectively the number of relevant identified features, the number of relevant features and the number of identified features. Table 3 illustrates the evaluation results for the 5 products yielded in the feature extraction phase. The highest values reached by our system are 79.54% precision and 81.92% recall while the average performance scores are 75.88% precision and 79.14% recall. Our method shows significant precision and recall scores where the recall value is higher than the precision value which proves that the majority of correct features were correctly recognized by the system. This demonstrates the efficacity of the use of converation interactions in extracting the product features which have been commented by users. Indeed, a high percentage of tweets, which are very short or ambiguous when taken separately, can be effectively handled in the feature extraction phase if we take into account the reply links between messages. The precision value is lower than the recall value, which indicates that certain identified features are not correct. This can be justified by the fact that most of the reviewers do not follow grammatical rules strictly while tweeting, hence, the parser is able to assign neither correct POS tag nor correct dependency relations between words. Moreover, we remarked that, when a pronoun to be resolved has more than two or three candidate antecedents, the occurrence of biased anaphora-antecedent pairs increases. This leaves scope for improving our method in order to better detect features to reach higher precision. In order to make sure that our feature extraction phase functions effectively, we compared the generated features using our method with the features obtained using the same method applied to the same collection of tweets taken separately rather than extracting conversations and without the employment of AR between messages. The mentioned method is pretty similar to the feature detection process applied by 14 which performs feature-based opinion summarization for customer reviews based on product reviews collected via Twitter as well as electronic commerce websites. The average recall of opinion sentence extraction is about 63% while the average precision is around 57%. We see that both the average recall and precision of the second method are remarkably lower than those of our method. On analysis, we deduced our testing dataset involves 13 824 anaphoric pronouns, in which 886 pronouns correctly refer to the product features. By applying the second method, we notice that only 40% of pronouns are detected while the rest is left unnoticed or erroneously extracted. According to the results given in in Table 3, we can admit that the proposed method is significantly more effective for the given task. Table 3. Evaluation results for the feature extraction Product

Digital camera 1 Digital camera 2 Smartphone 1 Smartphone 2 Smartphone 3 Average

Nbr of features

27 31 52 25 43

Feature selection on individual tweets

Feature selection on conversations (this research)

Precision(%)

Recall(%)

Precision(%)

Recall(%)

68.33 63.52 55.97 71.54 57.88 63.44

62.13 64.40 51.83 57.74 53.73 57.96

76.75 78.13 74.64 70.34 79.54 75.88

78.64 76.21 78.23 81.92 80.71 79.14

5. Conclusion This paper introduced a new method for product feature extraction which deals with twitter conversations instead of using simple individual tweets. In order to effectively extract the target product features, we used conversation interactions, mainly reply links involved in an AR process. The experimental results demonstrate that the proposed method is promising and can significantly improve the product feature extraction task by incorporating conversation structure. In future work, we would like to investigate the application of our approach on other user opinion resources rather than Twitter. We also look forward to improving the AR process to deal with implicit knowledge.



Rania Othman et al. / Procedia Computer Science 112 (2017) 927–935

935

References 1. A. Ritter, C. Cherry, and B. Dolan. Unsupervised modeling of twitter conversations. In HLT-NAACL, pages 172-180. The Association for Computational Linguistics, 2010. 2. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, pages 30-38. Association for Computational Linguistics, 2011. 3. X. Meng, F. Wei, X. Liu, M. Zhou, S. Li, and H. Wang. Entity-centric topic-oriented opinion summarization in twitter. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 379-387. ACM, 2012. 4. N. N. Bora. Summarizing public opinions in tweets. International Journal of Computational Linguistics and Applications, 3(1):41-55, 2012. 5. S. A. Bahrainian and A. Dengel. Sentiment analysis and summarization of twitter data. In Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on, pages 227-234. IEEE, 2013. 6. R. Mitkov. Anaphora resolution. Routledge, 2014. 7. J. Steinberger, M. Poesio, M. A. Kabadjov, and K. Jeˆzek. Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6):1663-1680, 2007. 8. R. Feldman, M. Fresko, J. Goldenberg, O. Netzer, and L. Ungar. Extracting product comparisons from discussion boards. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 469-474. IEEE, 2007. 9. N. Jakob and I. Gurevych. Using anaphora resolution to improve opinion target identi cation in movie reviews. In Proceedings of the ACL 2010 Conference Short Papers, pages 263-268. Association for Computational Linguistics, 2010. 10. J. S. Kessler and N. Nicolov. Targeting sentiment expressions through supervised ranking of linguistic configurations. In ICWSM. AAAI Press, 2009. 11. Z. Toh and J. Su. Nlangp: Supervised machine learning system for aspect category classification and opinion target extraction. In Proceedings of the 9th International Workshop on SemEval, pages 719-724, 2015. 12. M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168-177. ACM, 2004. 13. B. Liu. Opinion mining and sentiment analysis. In Web Data Mining, pages 459-526. Springer, 2011. 14. J. Jmal and R. Faiz. Customer review summarization approach using twitter and sentiwordnet. In Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics, page 33. ACM, 2013. 15. L. Ferreira, N. Jakob, and I. Gurevych. A comparative study of feature extraction algorithms in customer reviews. In Semantic Computing, 2008 IEEE International Conference on, pages 144-151. IEEE, 2008. 16. A.M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Natural language processing and text mining, pages 9-28. Springer, 2007. 17. V. Stoyanov and C. Cardie. Topic identi cation for ne-grained opinion analysis. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pages 817-824. Association for Computational Linguistics, 2008. 18. R. Belkaroui and R. Faiz. Towards events tweet contextualization using social in uence model and users conversations. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, page 3. ACM, 2015. 19. R. Flesch. A new readability yardstick. Journal of applied psychology, 32(3):221, 1948. 20. X. Liu, Y. Li, F. Wei, and M. Zhou. Graph-based multi-tweet summarization using social signals. In COLING, pages 1699-1714, 2012. 21. R. Belkaroui and R. Faiz. Towards events tweet contextualization using social in uence model and users conversations. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, page 3. ACM, 2015. 22. P. Turney. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Machine Learning: ECML, pages 491-502, 2001. 23. A. Kamal and M. Abulaish. Statistical features identification for sentiment analysis using machine learning techniques. In Computational and Business Intelligence (ISCBI), 2013 International Symposium on, pages 178-181. IEEE, 2013. 24. H. D. Kim, K. Ganesan, P. Sondhi, and C. Zhai. Comprehensive review of opinion summarization. Technical report, University of Illinois at Urbana-Champaignl, 2011. 25. Van der Geer J, Hanraads JAJ, Lupton RA. The art of writing a scientific article. J Sci Commun 2000;163:51-9. 26. Strunk Jr W, White EB. The elements of style. 3rd ed. New York: Macmillan; 1979. 27. Mettam GR, Adams LB. How to prepare an electronic version of your article. In: Jones BS, Smith RZ, editors. Introduction to the electronic age. New York: E-Publishing Inc; 1999. p. 281-304.