Semi-automatic Transformation of Structured

Semi-automatic Transformation of Structured Guideline Components into Formal Process Representations Katharina Kaiser Institute of Software Technology & Interactive Systems Vienna University of Technology, Vienna, Austria [email protected]

Abstract. Modeling clinical guidelines and protocols in a computer-interpretable format is a burdensome and time-consuming task. Existing methods and tools to support this task demand for strong user interaction, detailed medical knowledge, and knowledge about the formal representation. I propose a methodology using Information Extraction and Information Transformation by means of multiple steps to ease the modeling and make the process structured and traceable.

1 Introduction Clinical practice guidelines (CPGs) are ”systematically developed statements to assist practitioners and patient decisions about appropriate health care for specific circumstances” [1]. Research has shown that, if properly developed, communicated, and implemented, guidelines can improve patient care [2]. CPGs provide not only decision support for the medical personal (physicians, nursing staff, etc.), patients, and relatives, but also promulgate the most effective and efficient treatment. Therefore, CPGs are an important issue in quality assurance. Because the guidelines are largely in narrative form, they are sometimes ambiguous and generally lack the structure and internal consistency that would allow execution by a computer. Therefore, several groups have created computer-interpretable guideline representations (a comprehensible overview can be found in [3]). Although, methods and tools have been developed that support the modeling process, they only support this process insufficiently, as they demand detailed medical knowledge, knowledge about the formal representation methods, knowledge about modeling methodologies, as well as manual effort from the human modeler, and persons with both medical and computer science expertise are very hard to find. Thus, research has to be directed towards the development of methods and tools that decrease the human effort in this task. The knowledge acquisition task has to be supported by automating parts of the formalization process and providing the required knowledge. We will concentrate on formalizing of processes, actions, and their sequences, as these cover an important part of the formalization process. As basis of my thesis work, one main research question has been formulated: How can knowledge-based methods support the semi-automatic formalization of medical documents specifying processes (i.e., CPGs) into a given formal representation?

To accomplish this task I will use Information Extraction methods to automate parts of the formalization process. The next section discusses related work and describes the resulting problems. Out of this, my proposed approach is described containing the applied methods. I proceed with work done so far and conclude with future work, expected results, and benefits.

2 Related Work In this Section, I present a short discussion of some relevant work describing guideline formalization methodologies and tools as well as some examples of Information Extraction (IE) systems. 2.1 Guideline Formalization Methodologies and Tools For formalizing clinical guidelines into a guideline representation language various methods and tools exist, ranging from simple editors to sophisticated graphical applications and multistep methodologies. Several markup-based tools, such as Stepper [4], the GEM Cutter [5], the Document Exploration and Linking Tool/Addons (DELT/A), formerly known as Guideline Markup Tool (GMT) [6], and Uruz that is part of the DEGEL [7] framework, have been developed to accomplish the formalization task. Also graphical tools support this process. AsbruView [8] uses graphical metaphors to represent Asbru [9] plans. AREZZO and TALLIS [10] support the translation into PROforma [11] using graphical symbols representing the task types of the language. Prot´eg´e [12] is a knowledge-acquisition tool, where parts of the formalization process can be accomplished with predefined graphical symbols. But also methodologies (e.g., SAGE [13]) were developed that should help making the formalization traceable and and concise by using a multi-step approach. But still, in all of the above mentioned cases the modeling process is complex and labor intensive. Therefore, methods are needed that can be applied to automate a part of the modeling task. 2.2 Information Extraction Systems IE is defined as the task of extracting relevant fragments of text from larger documents, to allow the fragments to be processed further in some automated way, for example, to answer a user query [14]. Peshkin and Pfeffer [15] define IE as the task of filling template information from previously unseen text which belongs to a pre-defined domain. There exist various Information Extraction (IE) systems that were developed for various domains. For example, the BADGER system [16], which summarizes medical patient records. For the legal domain, Holowczak and Adam developed a system that supports the automatic classification of legal documents [17]. Besides these domain specific systems, there are also other systems using Machine Learning techniques, which can be applied to various domains. Some of these systems

can handle free text, semi-structured text, or both. Systems handling both forms are WHISK [18] or CRYSTAL [19]. RAPIER [20] can only handle semi-structured text, whereas AutoSlog [21] can only handle free text. Finally, different kinds of Wrappers were developed to transform an HTML document into an XML document (e.g., XWRAP [22] or LiXto, which provides a visual wrapper [23]). These methods and tools are very useful in case highly structured HTML documents are used or simple XML files should be extracted. However, CPGs are more complex and XML/DTD files that are more structured are needed in order to represent them. Compared to the various methods and tools described above our approach for IE deals with very complex documents, consisting of both semi-structured and free text, as well as tables. For this task we do not need to apply Natural Language Understanding, because the task demands not for understanding the text, but detecting patterns in the text. Likewise, we do not employ Machine Learning techniques due to the limited number of examples that we could use for the training of these techniques. We must also note the complexity of the information annotation task in the medical domain due to the complexity of the language used. That is why we use a manual development of rules for the purposes of IE.

3 Chosen Approach New approaches are required in order to facilitate the modeling process and support the knowledge engineer by providing the required knowledge. In my approach I propose a methodology to process relevant information in multiple defined steps by means of intermediate representations into a guideline representation language (cf. Figure 1).

s

t

n

i

a

a

d

r

l

a

m

=

e

f

*

o

l

e

o

v

l

>

i

n

n

e

a

i

t

t

n

o

e

a

L

c

o

l

"

r

C

"

*

=

el

>

e

d

t

c

e

t

a

n

c

o

s

n

d

a

u

p

t

>

m

*

i

"

a

>

o

e

r

e

t

n

=

p

y

u

o

e

f

f

m

n

n

u

d

f

n

e

u

e

/

n

d

i

e

n

u

d