Augmenting Oracle Text with the UMLS for Enhanced ...

18 downloads 0 Views 109KB Size Report
powerful indexing and querying capabilities of Ora- cle Text, and the rich biomedical domain knowledge and semantic structures that are captured in the.
Augmenting Oracle Text with the UMLS for Enhanced Searching of Free-Text Medical Reports Jing Ding PhD; Selnur Erdal MS; Rakesh Dhaval MS; Jyoti Kamal PhD The Ohio State University Medical Center, Information Warehouse, Columbus, OH Abstract The intrinsic complexity of free-text medical reports imposes great challenges for information retrieval systems. We have developed a prototype search engine for retrieving clinical reports that leverages the powerful indexing and querying capabilities of Oracle Text, and the rich biomedical domain knowledge and semantic structures that are captured in the UMLS Metathesaurus. Introduction As more free-text medical reports become available electronically, faster and more efficient access to them is needed, usually being provided via information retrieval (IR) systems (i.e. search engines). Medical IR systems are faced with great challenges due to the complexity in clinical reports, e.g. compound medical concepts (e.g. lung cancer) and wide use of abbreviations/synonyms/hyponyms. Compound concepts are frequently expressed as components separated across phases or even sentences, requiring strong indexing and querying capabilities for retrieval. In-depth domain knowledge is required to expand user queries for capturing abbreviations, synonyms or hyponyms. In addition to the IR challenges, it is also desirable to integrate the text queries with relational information, such as patient demographics. To address these issues, we have built a prototype medical report IR system that utilizes the UMLS Metathesaurus to augment queries in Oracle Text. Background The UMLS Metathesaurus is one of the most comprehensive biomedical knowledge collections. Its over 1 million concepts and their relationships provide the necessary domain knowledge to expand user queries. However, mapping a user query to UMLS concept(s)1 is not necessarily sufficient, especially for compound concepts expressed as separate components. Further expansion and refinement are necessary. In this project, we have used Oracle Text to address the preceding issue, leveraging its powerful indexing and querying capabilities for textual data. In addition to the standard Boolean operations, it also allows for the use of other operators, such as NEAR, WITHIN, and FUZZY. Since Oracle Text is a native component of Oracle database, combining textual and relational queries is as trivial as writing a single SQL

statement, eliminating the need of integrating a relational database with third-party IR software2. System description Our prototype was used to allow users to search over 2 million cardiology, dictated, pathology and radiology reports that are stored in the Information Warehouse at The Ohio State University Medical Center (OSUMC). After a user enters a free-text query, it is automatically expanded by a query processor that relies on the UMLS Metathesaurus prior to being sent to Oracle Text. First, the process tries to match the query phrase (case-insensitive string match) to a UMLS concept. If the phrase can be matched by a concept C0, then a second step as described below is executed. Otherwise, the query is sent to Oracle Text directly for literal keyword search. In the second step, the processor tries to split the phrase into two sub-phrases, each of which can be matched by a concept of semantic type Anatomical Structure, Biologic Function, or their descendent types. For example, lung cancer can be split into lung (Body part, Organ, or Organ Component) and cancer (Neoplastic Process). This type of compound concepts is frequently expressed as separate components in clinical reports. If such split is not feasible, the processor concatenates all synonyms and hyponyms of C0 with OR operator. If the phase can be split into sub-phrases C1 and C2, the processor formulates a query in the following format: (synonyms and hyponyms of C0 concatenated with OR) OR [(synonyms of C1 concatenated with OR) NEAR (synonyms of C2 concatenated with OR)]. The expanded query is then sent to Oracle Text. Conclusion In conclusion, we have developed a prototype search engine for retrieving free-text medical reports, integrating Oracle Text with the UMLS Metathesaurus. References 1. Aronson AR, Rindflesch TC. Query expansion using the UMLS Metathesaurus. Proc AMIA Annu Fall Symp. 1997:485-9. 2. Fisk JM, Mutalik P, Levin FW, Erdos J, Taylor C, Nadkarni P. Integrating query of relational and textual data in clinical databases: a case study. J Am Med Inform Assoc. 10(1): 21-38.

AMIA 2007 Symposium Proceedings Page - 940