Product Question Answering Using Customer

Product Question Answering Using Customer Generated Content - Research Challenges David Carmel

Alexa Shopping, Amazon Research Matam Park, Haifa, Israel [email protected]

Liane Lewin-Eytan


CCS CONCEPTS • Information systems → Question answering; ACM Reference Format: David Carmel, Liane Lewin-Eytan, and Yoelle Maarek. 2018. Product Question Answering Using Customer Generated Content - Research Challenges. In SIGIR ’18: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3209978.3210203

1

COMPANY PORTRAIT

Alexa is an intelligent personal assistant developed by Amazon, that can provide many services through voice interaction such as music playback, news, question-answering, and on-line shopping. The Alexa shopping research team in Amazon is a new emerging group of scientists who investigate revolutionary shopping experience through Alexa, while devising new search paradigms beyond traditional catalog search.

2

SPEAKER’S BIO

David Carmel 1 is a Principal Applied Scientist at Amazon in the Alexa Shopping Research team, and an ACM Distinguished Engineer. David’s research is focused on general search, question answering, query performance prediction, and text mining. David has published more than 100 papers in IR and Web journals and conferences. He is on the editorial board of the IR journal, and has been regularly serving as a senior PC member or Area Chair at SIGIR and related conferences, e.g., WWW, WSDM, CIKM.

3

PRODUCT QUESTION ANSWERING

The main vision behinds Alexa is to provide assistance to users in many aspects of life, including on-line shopping. As a personal assistant, Alexa should serve as a personalized shopkeeper and support Amazon customers who show interests in a specific product, or a family of products, and would like to clarify some issues, to get some more details, or to get some advice about the product usage. Customers can ask a factoid question, typically about a product specific attribute (e.g. “Does Kindle support Korean?”), or 1 David

Carmel is the designated speaker for this talk. Other co-authors bios can be provided upon request.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). SIGIR’18, July 8–12, 2018, Ann Arbor, MI, USA © 2018 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-5657-2/18/07. https://doi.org/10.1145/3209978.3210203

Yoelle Maarek


a subjective one (e.g. “Is the Harry Potter books appropriate for 7th year old kids?”). In this talk, we will address several research challenges that arise from the novel question answering (QA) paradigm, studied by our team, that is focused on answering customer questions about product items, using customer generated content found in Amazon on-line catalogs. We focus on subjective questions, which can relate to various intent types such as product usage (“ can I take three aspirin pills per day?”), recommendation (“any gift ideas for my old father?”), opinion (“what is better, Galaxy s9 on Note 8?”), superlative (“what is the best vacuum cleaner in the market?”), and many more. In order to address these research challenges, we vision a Product Question Answering (PQA) engine that takes as input a productrelated question asked by an Alexa user, and looks for an answer derived from product-related data. Factoid questions can be answered by searching over the catalog, using standard knowledgegraph-based techniques. However, for most subjective questions, catalog data is generally useless. For such questions, we search for an answer from community-generated content such as product reviews and the product related Q&As, asked and answered by the community of Amazon customers. The crowd’s opinions about the product, reflected through related reviews and QAs, cover many important details, tacit knowledge, and cumulative experience about the product, which are not covered by the official information provided by the vendor. On the other hand, this content that comes directly from Amazon customers, is a typical user generated content (UGC) hence it suffers from redundancy, inconsistency, spam, malicious content, and other typical flaws of UGC content. Our PQA engine applies automatic QA methods, enhanced with community QA approaches, to retrieve the most relevant answer found in reviews and QAs. The final selected answer is served by Alexa as a response for the asker. In the following we describe the main architecture of our system and several related research questions that we believe might be of interest to the IR community.

4

BUILDING A PQA SYSTEM

The suggested PQA system follows a traditional Question Answering architecture from unstructured data, including (1) content gathering, filtering, and indexing components, (2) a question analysis component, (3) a passage retrieval component, and (4) a final answer selection component. In our specific context, we intend to apply a combination of state-of-the-art QA and CQA techniques and adapt them to the product domain as well as to the voice medium. A high-level diagram of such a system is presented in Figure 1. In the off-line stage, we gather the content to be indexed and used for answering product questions. Data is gathered periodically

from general questions. Second, the questions should be categorized according to their type (product usage, opinion, recommendation, comparison, etc.) Third, we would like to identify the lexical answer type – the type of information the user is expecting to get back, which is typically used by QA systems for candidate filtering. It might be a Yes/No answer, an attribute value answer, or some features of the desired product such as the product type or the product brand. Finally we would like to identify the focus of the question which is the main entity the user is focused on.

Figure 1: PQA Architecture from all available sources, including catalog data, customer reviews, product related QAs, and external sources. As CQA data suffers from high variation in quality, and there are many products with thousands of reviews and QAs, we need to apply content quality analysis as well as diversification techniques, in order to keep a reasonable number of high quality and diversified reviews and QAs per product. During search time, after the user utterance has already been transformed by Alexa’s speech recognition component into a textual question, the first task of the system is to identify whether the question has a “product intent”, since only such questions will be answered by the PQA system. Second, the question should be classified according to its type, to the product the question is focused on, and to the type of the answer the user is expecting. The analyzed question is translated into a query that is submitted to the search component. The search component probes all available resources and returns a small set of relevant passages from the indexed data. Finally, an answer is selected from the retrieved passages which is rephrased to be served by Alexa as the final answer.

5

RESEARCH CHALLENGES

In this talk, we would like to address several research questions related to a PQA system, which we believe may attract the interest of the IR community, and will hopefully trigger more research efforts in this emerging and exciting field of product question answering. Voice interface. The dominant interaction channel with Alexa is by voice, as the question is derived from the user utterance, and the returned answer is dictated by Alexa. This raises many challenges in question understanding due to ASR errors and misinterpretations (e.g., “triple a” rather than “AAA”). Generating a friendly voice-based answer from the textual results is another voice specific challenge. Moreover, voice interaction also opens many opportunities for the search system since it allows the user to be far from the device and eliminates the need to look at the keyboard or screen. This may encourage users to provide more details about their needs as well as providing more feedback about results. The voice interface opens new opportunities for the PQA system to better exploit such signals. Product Question Understanding. While question understanding for QA is a well known task, it has not been deeply studied in the context of PQA. At first, we would like to identify the unique properties of questions with product intent, and their discrepancy

Distributed Passage Retrieval. The answer for a product question might be found in the catalog data, in the customer reviews, in product related QAs, or in external sources. However these sources are very different in nature hence the retrieval component should fit the search strategy to the different source types while searching over them. Moreover, these sources complement each other hence search result aggregation is beneficial as a review-based answer can support a QA-based answer, and vice versa. The passage retrieval component should apply aggregated search techniques in order to find the right answer. Question answering over heterogeneous sources is a big challenge and raises many questions such as how to integrate the search results while collecting different types of evidences for finding the final answer. Multi-aspect Answering. In contrast to factoid questions where in general only one entity (or a few) is expected to be the answer, many answers are valid for subjective questions. In such a case users will benefit from a multi-aspect answering. Such an answer will cover the distribution of answers from the crowd over the answer aspect space. For example, for a yes/no question 70% of the customers think the answer is Yes, while 20% answer No (for the rest the answer is unclear). Multi-aspect question answering in the context of PQA is a new paradigm that has not been studied yet, to the best of our knowledge. Interactive Personalized PQA. A natural dialog with Alexa is interactive in nature as the user who asked a question may respond to Alexa’s answer either by rephrasing the previous question, by asking a new following question, or by taking a corresponding action such as purchasing an item or stopping the dialog. Therefore, the PQA system should answer the user’s question in the context of the current dialog, as well as with the context of the user’s encounter history, taking into consideration previous questions and answers, and the user actions and feedback. Moreover, recent advancement in ASR techniques can enhance search personalization by identifying personal features such as gender and age directly from the user’s utterance. Evaluation. Most evaluation paradigms for QA deal with factoid questions for which the existence of one or a few answers are assumed. This does not hold for subjective questions where there is no ground truth. How can we automatically judge the quality of a QA system dealing with subjective questions? There were several attempts to tackle this issue in the past, including the TREC’s LiveQA track. However, it is far from being solved and developing evaluation schemes for subjective questions is an open and interesting research challenge.