exploiting semantic web technologies for harmonizing e ... - CiteSeerX

10 downloads 3861 Views 499KB Size Report
courts rental, spas, entertainment parks, etc. Due .... specifying dedicated bridges and customizing ser- .... server to discover the location of the other partner.
1098-3058/05 $20.00 + .00 Copyright  2005 Cognizant Comm. Corp. www.cognizantcommunication.com

Information Technology & Tourism, Vol. 7 pp. 201–219 Printed in the USA. All rights reserved.

EXPLOITING SEMANTIC WEB TECHNOLOGIES FOR HARMONIZING E-MARKETS

¨ PKEN,‡ and HANNES WERTHNER§ MIRELLA DELL’ERBA,* OLIVER FODOR,† WOLFRAM HO *eCommerce and Tourism Research Laboratory-eCTRL, ITC-Irst, Trento, Italy †E-Commerce Competence Center-EC3, Vienna, Austria ‡eTourism Competence Center Austria-ECCA, Innsbruck, Austria §Department of Information Systems and e-tourism, University of Innsbruck, Austria

A main obstacle to e-commerce is the well-known “interoperability problem.” Different players have different views of the world, even in the same application field. This is particularly true in the travel and tourism e-market where IT has been applied for a long time, leading to a plethora of different information systems, each with its own data model and structure. In this article we describe the approach followed in “HARMO-TEN,” a European project aimed at solve the data heterogeneity problem by setting up a “virtual interoperable network” and providing the participants with a technological infrastructure based on a shared ontology. This will allow exchanging information in a seamless way while keeping existing data models unchanged. Key words: Data integration; Interoperability; Ontologies; Mediators; Semantic middleware

Introduction

versity in both areas of operation and technologies, reflecting the different existing views of the tourism domain according to the players’ cultural and social backgrounds. Consequently, an interoperability problem arises among all the electronic market participants, creating an obstacle, especially for SMEs that can’t afford the cost of adapting their models to new standards. In other domains, where a common approach to data description has been followed, the passage to real market interoperability may be simpler. However, this is not the case for the tourism industry: there are global standards, only used by big companies; there are standards considering the needs of small tourism

Travel and tourism is an information-based business, and this industry was one of the first sectors to employ ICT as early as the 1960s. Major stakeholders are small or medium-sized enterprises (SME). There are approximately 1.3 million travel and tourism businesses in Europe—9% of all European enterprises—and 95% of them are SMEs. However, more than 40% have websites (Gratzer, Werthner, & Winiwarter, 2004). The size of the companies, the long history of application, and the rather high Web penetration led to a plethora of different tourist information systems, with high di-

Address correspondence to Mirella Dell’Erba, eCommerce and Tourism Research Laboratory-eCTRL, ITC-Irst, Trento, Italy. E-mail: [email protected]

201

202

DELL’ERBA ET AL.

suppliers, but only used on a national level. Even strong players, such as GDS/CRS, do not seem to be strong enough to force one sole standard upon other tourism actors. Global interoperability and the possibility to make the local offer of small suppliers globally available are still missing. A major reason for failures of past standardization initiatives is their lack of flexibility and extendibility. In this article we describe a more flexible approach for harmonizing the (different) tourist electronic markets, based on Semantic Web technologies. Moreover, the proposed solution offers a technical platform, allowing to maintain the existing data model unchanged and, at the same time, to exchange information in seamless manner. The article is structured as follows. The next section deals with the interoperability problem and describes some of the only partly successful past initiatives. Then the proposed approach is presented together with a description of the technical platform and its components. The following sections give a brief picture about the logical and physical architecture of system. Finally, we relate our work to similar approaches and give an outlook on our planned future activities. The Interoperability Problem The interoperability problem between two or more cooperating systems can be differentiated into: (a) information-level interoperability—addressing clashes between different data representations and their meaning, and (b) service-level interoperabil-

Figure 1. Business relationship stack.

ity—when different processes should cooperate to automatically perform business between enterprises. In the tourism domain system supported cooperation often takes place at the information level only, leading to weakly networked reality, where participating systems share their data but perform businesses separately. For this reason, HARMONISE concentrates at the information level interoperability problem and proposes a solution to this level only. Additionally, HARMONISE provides an infrastructure supporting the information exchange in a seamless manner, lowering complexity and cost (two lower layers of Fig. 1). At the information level, heterogeneity between different systems can be categorized as follows: Semantic clashes: addressing different interpretation or meaning of concepts of different systems. This includes different naming of the same concept or different granularity of conceptualization as well as the heterogeneity of structures used to model the domain of different systems. Representational clashes: caused by different data representations, where, for example, data are stored in RDBMS at one system whereas another uses XML format. The use of different constructs of the same format of information representation belongs also to this category (e.g., within XML the usage of attributes versus nested literal elements).

HARMONIZING E-MARKETS

203

Figure 2. Clashes examples.

Figure 2 shows a typical example of a structuring conflict. It should be noted that more than one clash may occur in such a case. With respect to the impact they have on the exchange of data, these conflicts can be further differentiate into: Lossless (fully mappable): including all clashes that can be resolved without any loss of information. This means that there exists a transformation where given information in input is transformed to a different representation but with the same semantic content. Lossy (partially or nonmappable): covering all the conflicts for which any conceivable transformation (in one direction or the other) will cause a loss of information. Typical cases are, for example, when the two schemas represent information at different levels of granularity, refinement, or precision. Figure 3 summarizes clashes between different standards identified within the tourism domain, with examples for the respective cases. Direct Approaches Most of the current direct approaches to solve the issue of harmonization are based on the idea

of fixed, obligatory standards, to be imposed on everybody. Depending on the used communication mechanism, these standards can be defined as fixstructured messages [e.g., United Nations Electronic Data Interchange for Administration, Commerce and Transport (UN/EDIFACT) TT&L (2002) or American National Standards Institute ASC X12I TG08 (2002)], XML-based messages [e.g., Open Travel Alliance (OTA) (2001)], or functionor object-oriented application programming interfaces [e.g., Hospitality Industry Technology Integration Standards (HITIS) (American Hotel & Motel Association, 2002) or TIN (1992)]. All those conforming to the respective standards are automatically able to exchange information among them. However, all details of the exchanged messages, including all technical details depending on the communication mechanism, must be committed among all communication partners. This leads to a high effort for defining and maintaining such standards and therefore are almost exclusively used by large companies [e.g., hotel chains, airline companies, global distribution systems (GDS/CRS), etc.]. Newer standardization approaches try to eliminate these shortcomings by the usage of XML. XML-based standards are easier to use and enable

204

DELL’ERBA ET AL.

Figure 3. Clashes classification.

HARMONIZING E-MARKETS at least a limited flexibility and extendibility. However, the usage of XML does not solve the problem that all message details have to be committed among all communication partners. Conceptual-Level Approach In Omelayenko and Fensel (2002) the authors show that an attempt to resolve both semantic conflicts and representational conflicts within one transformation step causes a scalability problem. Such a solution would lead to a rather complex set of rules and consequently to possible performance problems of the processing engine. The separation between semantic and representational clashes indicates the need of a distinction between corresponding steps in the overall harmonization process. Within the IT domain the separation between physical and conceptual level is an important principle (e.g., when modeling software systems, databases, or data structures in general). Conceptual aspects (e.g., the meaning of entities of a problem domain, their characteristics, and relationships to other entities) are specified independent of their physical representation (e.g., their storage structure). The same concepts can then be used for different technical solutions and changes on the physical level can be done independent of the conceptual level. Applying the principle of separating conceptual and physical aspects to the problem of harmonizing electronic tourism markets leads to new opportunities for reaching interoperability. The agreement of market participants can now be restricted to the conceptual level (i.e., to the concepts behind exchanged messages) and can be reached much easier. The principle of conceptual level mapping can be applied, allowing preservation of local standards (formats) and facilitating interoperability.

in a seamless, semiautomatic manner, independent from geographical, linguistic, and technological boundaries. HARMO-TEN is based on the work carried out in HARMONISE (IST-2000-29329), a successfully completed research project producing a more flexible approach for harmonizing the existing and emerging online marketplaces, opposite to fixed standardization methodologies. In fact, HARMO-TEN is the market validation of the HARMONISE results. The HARMONISE project addressed the information interoperability issues with the twofold goal of supporting information exchange among market participants and, at the same time, fostering Semantic Web technologies in the tourism domain as a base for future enhancements towards higher level interoperability solutions. Instead of promoting fixed standardization, HARMONISE relies on a conceptual-level approach to solve the interoperability problem without imposing any changes to the existing systems and their internal models. To achieve this goal, most of the relevant stakeholders in the European tourist domain, both private and public, have been invited to set up a “virtual interoperable network” with the objective of gathering and modeling the most common information concepts used in the domain. The result is a tourism domain ontology serving as a reference model in a mediated scenario for the players willing to participate in the harmonization network (i.e., Harmonise space) (Fig. 4). Data reconciliation among different tourism information systems is obtained with respect to this domain ontology. The mediator Hi acts as a semantic gateway between the systems, permitting the receiver to view

The European Initiative HARMONISE As reaction to the fragmented scenario in the etourism domain, the European project HARMOTEN (Tourism Harmonisation Trans-European Network-eTEN C510828; www.harmo-ten.org) aims to create an electronic space for tourism stakeholders where all businesses in the marketplaces should be enabled to exchange their information

205

Figure 4. The Harmonise space.

206

DELL’ERBA ET AL.

the source as an extension of its own database, without concern for the differences in names and data representations. The underlying technology will be described in the next section. Besides the HARMO-TEN project, the HAR MONISE platform is also applied in the European Tourism Portal (etd.ec3.at/). The goal of this new portal, envisaged to become the most prominent European tourism-related Web address (visitEurope. com), is to provide a unique point of access for all tourists interested in Europe and its tourism-related offer. It will be operated by the European Travel Commission (ECT), the association of all European national tourist boards (NTOs). The European Portal will be populated with information coming from all the NTOs, but also many other sites offering interesting content. Because each of these organizations has its information model, harmonizing the related information before offering it at the European site is a central issue. In the HARMO-TEN scenario, the European Portal is considered as a further actor mapping information onto the shared ontology in the same way the other participants do. HARMONISE has developed the following components, which represent the starting point for the HARMO-TEN market validation phase: • Network of cooperating actors in the tourism domain working together to achieve information interoperability and to define a common view of the tourist domain: the “Tourist Harmonisation Network (THN).” • Tourism ontology, to model and maintain the basic concepts used in the tourism domain: the “Interoperability Minimum Harmonisation Ontology (IMHO).” • Mediating platform: mediator tool for a conceptual-level alignment of local data models and subsequent information translation utilizing semantic mappings and reconciliation techniques: the Harmo Suite System. Tourism Harmonisation Network Within the HARMONISE project, a task force on tourism standard—the Tourism Harmonisation Network (THN)—was established as an open organization bringing together domain experts, IT professionals, standardization initiatives, and tour-

ism organizations worldwide with the objective to coordinate the harmonization effort and to build a domain ontology. Among these organizations were: National Tourist Board such as TourInFrance (France), SIGRT (Portugal), Finish Tourist Board and Spain Tourist Board; world tourism organizations such as World Tourism Organization (WTO), besides International Federation for IT and Tourism (IFITT), which was already a project partner from the early start; tourism standard organizations like Open Travel Alliance/Travel Technology Initiative (OTA/TTI); and systems such as WhatsOnWhen or TIScover. By bringing together the different market participants and domain experts, THN wanted to ensure a broad acceptance of its mediation role. At the moment, THN has expanded from its original seven members during the research and design phase to the 15 partners working on the HARMOTEN objectives. THN is designed in such a way that it does affect, directly or indirectly, the business strategies of its involved actors not by positioning itself in competition with every other initiative/actor but, rather, alongside them as a collaborative player within the market. THN offers the possibility to be part of a community aimed at facilitating and encouraging meeting, discussion, and cooperation about tourist topics both with all the other members and with outside players. During the HARMONISE project, the key role of the THN has been to build the mediating domain ontology, the so-called “Interoperable Minimum Harmonisation Ontology” (IMHO), representing the consensual agreement on the tourism domain concepts and their characteristics. The first version of the IMHO has been created, initially covering the subdomains of Accommodation and Event & Activities. The HARMO-TEN partners intend to establish a European organization to run the core service, represented by the trans-European “mediating” service. The Harmonise “mediating” service must be as economic as possible in order to attract the maximum number of customers. This reflects the fact that the real value of the mediation depends on the number of participating organizations. The more, HARMO-TEN has to identify a valid business model and business plan, based on the fact

HARMONIZING E-MARKETS that the Harmonise technology (ontology and mapping functions) is open source. So HARMOTEN needs to identify key services, such as mapping support, ontology maintenance, or training, with the objective to provide sources of income able to financially support THN and its work. The Tourism Domain Ontology Ontology can be looked at as an abstract, conceptual model of an application domain (Gruber, 1993). The objective of an ontology is to reduce conceptual and terminological confusion and to reach a shared understanding within a specific domain. This is achieved by identifying and properly defining a set of relevant concepts that characterize a given application domain together with their relationships. In the tourism domain, typical concepts of an ontology are, for example, hotel, hotel room, event, date of an event, etc. When trying to reach interoperability between different systems, using different data (exchange) formats, an ontology can be used as a point of reference to map between these data formats. The conceptual mapping will then serve for the reconciliation at the instance level. The IMHO defines a set of concepts of the tourism domain, which are used within different data formats or data exchange standards, and in this way enables a mapping between those formats and standards. When looking at different relevant data formats and standards, the approach of the IMHO is to focus on the overlapping concepts within those standards, which are necessary to allow the respective mappings. Methodology of Ontology Definition Within the HARMONISE project a methodology was designed for defining the domain ontology and reaching a consensus among the members of THN. The general approach was to collect and analyze the data models of the various THN members and agree on the core concepts that are needed by the majority of the members. Figure 5 illustrates the consensus building process. A matrix holding all the elements of each of the data models was built, and the most widely supported concepts were identified. In addition, elements judged to be of particular importance were

207

added from one or more of the constituent data models, even if support for such concepts across the full range of participating organizations is narrow. In this way, disagreement between partners about the concepts included in the ontology could be avoided. In fact, during the entire project there was no single case where such a disagreement could not be avoided. Having reached agreement on the core concepts, the matrix was further developed by defining the elements of the HARMO NISE ontology and mapping the data representations of each member onto the corresponding ontology elements. After a consensus had been reached on all elements and their relationships the ontology was stored in a repository. HARMO NISE uses the “Resource Description Framework” (RDF) (2000) as a common exchange language for the mediating platform; thus, a RDF Schema representation of the IMHO is exported from the repository in order to be used in the Harmo Suite system (to be described later). The methodology described above proved to be suitable to facilitate a consensus process among the THN members, and all THN members agreed on all concepts included in the ontology. The methodology makes use of an electronic voting mechanism to deal with disagreements and take the final decision (see section System Implementation). The positive result of this process confirms a major hypothesis of the HARMONISE project, namely the assumption that consensus between communication partners can be reached much easier on the level of an abstract, conceptual model, dealing as a point of reference instead of a fixed standard. The partners do not have to agree on a common nomenclature and structure but just have to make sure that all relevant elements of their own data model are represented by semantically equivalent concepts within the ontology. Subdomains Covered by the IMHO The HARMONISE project restricted the type of information modeled to “static” information (e.g., descriptions of tourism services), and does not directly support business processes like availability check, booking, or payment. Due to the timeframe of the project the ontology building was limited to two tourism subdomains: events & ac-

208

DELL’ERBA ET AL.

Figure 5. Matrix-based consensus building.

tivities and accommodation. The subdomain events & activities covers all types of events like cultural events (e.g., theater, opera, concerts, etc.), conferences, courses, lectures, and sporting events as well as activities, covering services like tennis courts rental, spas, entertainment parks, etc. Due to its relatively limited complexity (compared to subdomains like accommodation), this subdomain was the starting point for the ontology-building process in order to evaluate the ontology-building methodology. Figure 6 shows a part of the ontology for the subdomain events & activities. The subdomain accommodation in general covers all types of accommodation services providing an overnight stay. This includes camping, houses, apartments, hotels, motels, guesthouses, farms, youth hostels, pensions, etc. In order to limit the complexity of this subject, the domain of interest was further restricted to all types of accommodation offering a room such as guesthouses, pensions, etc. The IMHO for the subdomain accommodation consists of over 400 concepts (including concepts reused in different parts of the ontology), compared to an average size of the data formats of the different THN members of 100 to 150 concepts. Reasons for this are the approach of including concepts

used by at least two THN members as well as the level of granularity of the IMHO (see below). Ontology Design Aspects Level of Granularity. Concepts of an ontology can be modeled on different levels of granularity (e.g., elementary concepts like street name or street number or higher level concepts like address). Mapping between different specific models is based on the principle of decomposition. Specific models are decomposed until the resulting components of the different models overlap. In order to identify overlapping components, the components are semantically annotated by linking them to corresponding concepts of the ontology. In order to facilitate a mapping between different specific models based on this decomposition approach, the IMHO provides the smallest possible semantic pieces or concepts (besides aggregated concepts, see Structured Versus Unstructured Data). Aggregation Versus Inheritance. Reusing common concepts in different parts of an ontology is a fundamental design principle in order to avoid duplications and ensure consistency and maintainability. Aggregation and inheritance are two mechanisms for reuse. Aggregation relationships are de-

HARMONIZING E-MARKETS

209

Figure 6. Event ontology.

fined between a whole and its parts (e.g., between an address and its city). Inheritance relationships are defined when a concept (the subconcept) is a kind of another concept (the superconcept) (e.g., a hotel is a kind of accommodation or a football match is a kind of event). When defining an inheritance relationship, the subconcept inherits all characteristics of the superconcept. As the heavy use of inheritance relationships leads to strong dependencies between concepts and an inflexible design, inheritance relationships are only used to represent pure kind-of relationships. Reusing common characteristics is modeled by aggregation relationships. Structured Versus Unstructured Data. The same concepts might be represented in different standards by a detailed structure of aggregated concepts or by an unstructured stream of data (i.e., a string). In the case of an unstructured stream, it is not possible to define mapping rules from this element to the detailed concepts of the ontology. In order to support a data exchange even for unstructured data, the IMHO enables the mapping of elements of specific standards also against aggregated concepts of the ontology. Thus, the ontology provides aggregated concepts for all pieces of in-

formation, which might be represented in specific standards as single data elements. Service State. Tourism services can be modeled in different states of their life cycle (e.g., potential service, offered service, booked service, or used service) (Ho¨pken, 2004). Important for the design of the ontology is the differentiation between a potential and an offered service. A potential service is a description of the complete offer of a supplier, provided as one highly aggregated structure, containing different concrete services and customer choices. The overall offer of a hotel could, for example, be described by one potential service as highly aggregated structure, containing all different rooms of the hotel and all available meal services or other additional services. An offered service is a concrete service, offered to a customer. An offered service is exactly one tourism service without alternative characteristics. As HARMONISE addressed the exchange of static data in a B2B environment, not dealing with business processes like searching for or booking of tourism services, the IMHO focuses on the modeling of potential services, representing the overall offer of a supplier. Descriptions of concrete services (i.e., offered services) can still be mapped

210

DELL’ERBA ET AL.

against the ontology, as they represent instances of the corresponding potential service. Multilanguage Support. Multilanguage support is related to textual information, exchanged within messages (e.g., descriptions of events or hotels). When supporting multiple languages, those textual descriptions will exist within a message several times in parallel for the different languages supported. In order to enable a mapping of textual descriptions in different languages, the IMHO provides a specific multilanguage concept, representing a textual description in different language, modeled as an array of concepts, aggregating the language and the corresponding text field. The HARMONISE Mediator Because tourism is a networked business, even on a worldwide scale, actors (information providers and information consumers) need to exchange information frequently and to many different partners. One information provider provides the same kind of information to several partner systems for further processing or distribution. On the other side, one information consumer (e.g., information broker) collects data from several sources and combines them into more complex offerings. In a nonmediated scenario the number of communication interfaces would grow to n2 (Werthner, Fodor, & Herzog, 2004) with n being the number of participating nodes. To lower the number of necessary mappings within Harmonise space, the mediating ontology IMHO serves as a central point of reference for all partners. Each partner is required to map his local conceptual model against the IMHO in order to customize his own gateway towards the network. This one-time customization process produces a set of rules carrying information about semantic relationships between constructs of the two models, and serves for the subsequent information translations at the instance level between the respective systems, using the intermediary harmonized format. Information in this intermediary format can be “understood” and processed by each actor in the network, having his customized gateway. When exchanging data, the information flow between two communicating systems is established in a peer-to-peer fashion without any central

node. The mediation service is decentralized and distributed over the participants, the actual data transformation takes place at Harmonise gateways in cascaded fashion (forward transformation at sender and backward at receiver). Thus, the mediator’s topology constitutes a star at the conceptual level—to lower the number of necessary links, whereas peer-to-peer network is established at the physical level—to avoid potential bottlenecks in the information flow. In the HARMONISE project the information exchange was based on XML (eXtensible Markup Language) as a standard for data representation and exchange over the Web. Currently, we work on the extension to include RDBMS table structures or CSV structures. Reflecting the two-level architecture of the mediator we distinguish two phases of the harmonization process: (i) customization phase where the actors customize their gateways by establishing mappings between their local data models and the mediating ontology, and (ii) cooperation phase when the actors’ information is automatically reconciled (translated) to the target format reusing these mappings. Figure 7 shows the harmonization process of the mediator with the customization and cooperation phase on both conceptual and physical level. The figure shows only the harmonization of a single actor participating in the overall harmonization process. In case a complete communication path between two actors is established, the illustrated processes are deployed twice in a symmetric fashion. At the side of information provider the process is called forward harmonization; on the information consumer the term backward harmonization is used. The harmonization procedure takes the following steps (Fodor & Werthner, 2005). Customization Phase Step 1: Ontology Export. The mediating ontology is currently maintained in the ontology management tool Prote´ge´. In the customization phase an RDFS representation of the mediating ontology is exported from the ontology repository to the mapping process. Step 2: Conceptual Normalisation (C-Normalisation). The governing schema of a source data

HARMONIZING E-MARKETS

211

Figure 7. Harmonise mediator.

representation (currently XML Schema) is “lifted” to a local conceptual model also represented in RDFS format. The purpose of this semiautomated step is to reverse engineer the conceptual information hidden in the schemas of local data sources and to align the obtained models to the representation of the mediating ontology. Differences in the representations of conceptual information are eliminated (Fodor & Werthner, 2004). From the Semantic Web perspective, conceptual models produced by the reverse engineering process can be seen as simple ontologies providing a good base for ontology building process at local sites. This can be particularly convenient in the cases of, for example, established tourism standards with an existing community of users in order to enable their legacy systems for Semantic Web. Step 3: Semantic Mapping. Semantic mapping is the central task of the harmonization process. In HARMONISE we currently use the mapping methodology MAFRA (Maedche, Motik, Silva, & Volz, 2002) supported by a GUI mapping editor tool and reconciliation engine. In this step the local normalized conceptual model is semantically mapped against the shared

IMHO ontology. The aim is to eliminate the semantic clashes introduced earlier in this article. The concepts of the source model are projected onto the concepts of the target ontology (or vice versa in the case of backward harmonization) specifying dedicated bridges and customizing services driving the corresponding instance data manipulation. The product of the mapping process is Semantic Map carrying the necessary information for the semantic reconciliation step in the cooperation phase. Currently, only a manual mapping process is supported by the mediator. However, dedicated tools for (semi)automated mapping discovery can be deployed in a modular fashion in order to enhance the mapping process. Using a mapping editor the local concepts are mapped against the IMHO declaratively defining reconciliation rules for future data transformations. This mapping is a unidirectional process, so two mappings have to be performed for forward and backward harmonization (two-way communication). Figure 8 illustrates such a mapping process. For example, in the Finland tourist board information system (MEK) there is one “Accommodation” concept indicating any kind of lodging with sev-

212

DELL’ERBA ET AL.

Figure 8. MEK system semantic mapping.

eral properties such as SupplierStreetAddress, HotelFacilities, etc. In the IMHO exists the “Site” concept with the meaning of “Accommodation,” which has no direct properties but a certain number of relationships with other concepts such as SiteAddress, SiteName, SiteFacilities, etc. Thus, relating “Bridges” are created between the Accommodation concept (and its properties) in the MEK ontology and the accommodation-related concepts and their properties in the IMHO ontology (which can be mapped with the MEK ontology). There are bridges for both concepts and properties. Cooperation Phase Step 4: Data Normalization. In the cooperation phase, the local instance data (e.g., in XML format) are transformed into an image of the local

conceptual model. This process is currently driven by the information produced within the corresponding C-Normalisation reverse engineering step and is carried out by the Normalisation Engine. The output is a set of local normalized (annotated) data in RDF format. Step 5: Semantic Reconciliation. After obtaining the normalized local instance data in RDF format, these are transformed into the Harmonise Interchange Representation (instances of IMHO) using the Semantic Map, the output of the corresponding mapping process. This transformation is conducted by the MAFRA reconciliation engine, which uses the map repeatedly, each time new instance data arrive at the input. The HIR representation of the data is considered as universally “understandable” within the entire Harmonise space

HARMONIZING E-MARKETS and can be processed by each actor participating in the H-Space.

213

which is installed and customized for each actor. It includes the semantic mapping tool and the reconciliation engine.

System Implementation The HarmoSuite package contains all the software components to support the Tourism Harmonisation Network; from a logical point of view the HarmoSuite encompasses the following subsystems (see Fig. 9): • Ontology management subsystem, to construct and to maintain the IMHO domain ontology. • Tourist Harmonisation Network Control Centre (THNCC) subsystem, which represents the administration part of the HarmoSuite system. It consists of all functions required by the system users to interact with the Harmonise application environment and manages the administrative and security functions related to the HarmoSuite; it also includes log file activities and related analysis functions. • Cooperation Subsystem (COSS). The COSS is the “distributable” part of the HarmoSuite,

Figure 9. Harmo Suite subsystems.

Ontology Management From a technical point of view, the THN consortium has been supported by the HarmoConSys tool during the conceptual debate and decision making about concepts, their representation, and their relationships that had to be included in the Harmonise Ontology. The name HarmoConSys (Harmonise Consensus System) consist of the four components: • Meeting Room. The meeting room supports the online discussions to facilitate the consensus process among the task force members. The room’s use is for an ongoing discussion (similar to a mailing list), or for online meetings (similar to a chat), dealing with specific topics. • Voting Room. The voting room is used to conclude discussions about specific concepts or

214

DELL’ERBA ET AL.

parts of the ontology through online voting. Decisions are based on a majority voting. • Documentation Room. The documentation room serves as a document repository (e.g., it contains all the documentation about relevant tourist data formats or standards). • Group Information Space. It maintains information about the group and group members. The first version of the IMHO was constructed using the ontology management tool SymOntoX (Symbolic Ontology XML-based management system). This tool enables the modeling and the storage of information with a high expressive power, based on the OPAL (Object, Process and Actor Language) methodology (Missikoff & Taglino, 2002). The basic elements of OPAL are: (i) concept categorization by assigning a kind (Object, Process, Actor, Information Component, Information Element, Action, Elementary Activity) to each concept, and (ii) set of relations (Specialization, Decomposition, Relatedness, Predication, Similarity) to link the concepts among them. The language defines meta-concepts (Object, Actor, etc.) and meta-relations (Generalization, Specification, Predication, etc.) between concepts to allow the modeling of knowledge about a specific domain. As a step towards Semantic Web, Harmonise partners agreed to use RDF as a representation language. Thus, the RDF Schema representation of the mediating ontology was exported from the repository to be used in the Harmonise mappings. For this export task an extension of RDF Schema incorporating OPAL modeling notions (meta-concepts, meta-relations, and the notions of concept features and relation adornments) was defined in order to preserve the expressive power of OPAL. However, currently the IMHO is entirely maintained in RDFS format using Prote´ge´ environment. Tourist Harmonisation Network Control Centre The Tourism Harmonisation Network Control Centre (THNCC) subsystem supports all the functionalities required by the system user to interact with the HARMONISE platform. THNCC subsystem contains the following components: THNCC website; THNCC Web service; THNCC back office.

Every eTourism market participant willing to participate in the Harmonise space can register as THN actor at the THNCC website (see Fig. 10). Accepted actors download the partner toolkit package including the Cooperation Subsystem, enabling the set-up and customization of their own Harmonise gateway (semantic mapping is performed). After the gateway has been activated the produced mapping is formally validated in the THNCC. This test determines which concepts of the IMHO are recognized by the new actor as well as identifies possible losses during the reconciliation process. This validation certifies that the reconciliation process effectively meets the functional requirements of the Harmonise space. When validated, the new actor is ready to take part in the network and its data will be virtually made “understandable” to all other actors. HARMONISE uses WebServices based on the UDDI (Universal Description, Discovery, and Integration) standard server. THNCC provides central registry system where clients may look up services and discover potential business partners. During the cooperation phase, the actor’s Harmonise gateway exchanges SOAP (Simple Object Access Protocol) messages with the central registry system in order to perform authentication and partner lookup. Having selected the business partner the registry system provides the parameters necessary for opening a direct communication channel (e.g., IP address, supported transport protocol, and port). The THNCC website also provides a communication forum where all the partners can post topics related to electronic tourism market and discuss these with other actors as well as the Harmonise operators. The communication is supported by a messaging system enabling a differentiation between user messages, group messages, and broadcasts. The THNCC back office is dedicated for administration purposes and is made available only to the Harmonise administrators. This component enables perform advanced management tasks (e.g., actors’ management). Cooperation Subsystem The COSS is the downloadable part of the HarmoSuite package. In order to participate and coop-

HARMONIZING E-MARKETS

215

Figure 10. Harmonise scenario.

erate in the Harmonise space, each actor must download, install, and configure the COSS. COSS interacts with the central registry system at the THNCC, the other participants’ legacy systems via their COSS. This subsystem is a composition of three separate software tools that interact with each other: Customization Tool; Customized Harmonise Gateway; Local Administration Console. The Customization Tool supports the customization phase facilitating Conceptual Normalisation and Semantic Mapping processes by dedicated GUI tool based on MAFRA (see discussion on step 3 in Customization phase). The mapping produces reconciliation rules in the form of a Semantic Map, which is then stored in a local repository in order to be accessed in the cooperation phase. Currently, this Semantic Map needs to be manually updated whenever IMHO or local model evolve. Customized Harmonise Gateway (CHG) (see Fig. 11) is responsible for sending and receiving messages from the Harmonise network as well as for translating the content between the IMHO compliant format and the local instances. All security checks and validations as well as partner authentication and event loggings are performed by the CHG. To enable these features CHG includes the reconciliation engine, legacy system connectors, and a messaging system. An Application Program Interface (API) provides these basic functionalities to applications (Fig. 11).

The reconciliation engine supports the cooperation phase of the overall harmonization process. It implements the transformation process, taking care of the data translation between the respective participating entities. As default connector to the legacy systems CHG supplies a file system connector allowing a communication using files in specific directories. There are two directories available similar to sendmail system: (i) Inbox: incoming data are written in Inbox in local XML format; (ii) Outbox: file system connector regularly check the Outbox directory where XML documents to be harmonized are placed. Outgoing files are cleaned after being processed (harmonized) and propagated to the respective communication partner. The Messaging system supports the communication with the other partners in the Harmonise space. When a Harmonise partner wants to send a message to another Harmonise partner, his local messaging system queries the central registry server to discover the location of the other partner. For communicating with another partner the messaging system must obtain an authentication ticket from a remote authentication server in order to enable authentication at the partner’s messaging system. After receiving the necessary parameters the communication with the remote system is established. SOAP messages enclose the harmonized data in RDF format—the actual instances of IMHO. At the recipient’s side the messaging sys-

216

DELL’ERBA ET AL.

Figure 11. The CHG overview.

tem authenticates and validates the message and forwards its content to the reconciliation engine to be translated into the native local format. The Harmo Suite architecture has been built according to the 3-tier design pattern and to the MVC (Modeler/View/Controller) (Model, 1988) approach to the organization of systems. The MVC’s model component contains the business model which corresponds to the 3-tier’s business and data source layers. The controller and view components of the MVC correspond to the 3-tier’s presentation layer. For the Harmo Suite software development, standard technologies and Open Source products have been used. The Harmo Suite system package is a Java-based platform and its components are implemented as a set of Java classes. Related Work The attractiveness of the Harmonise solution lies in the combination of novel approaches and tools into one comprehensive service and its application in the tourism domain in order to support electronic business at the information level. The architecture of the overall platform is tailored to the specific needs of the domain, where information sharing and exchange play a crucial role. During the project we identified and analyzed a number of related approaches for interoperability and integration of heterogeneous systems. Available commercial EAI (Enterprise Application Integration) and B2B (business-to-business) integration products (e.g., BizTalk, WebSphere, WebLogic, or ActiveEnterprise) target the integration of het-

erogeneous systems to enable more comprehensive service compositions. These solutions mainly rely on well-established technologies like XML and WebServices and, unlike HARMONISE focusing at information level, follow a primarily service-oriented approach. The F-Logic-based inference engine OntoBroker serves as a semantic middleware platform that integrates heterogeneous data sources and builds a unique interface to the information (Decker, Erdmann, Fensel, & Studer, 1999). BUSTER (Bremen University Semantic Translator for Enhanced Retrieval) is a middleware for information retrieval that aims at using ontologies for semantic integration of information sources (Visser & Schuster, 2002). The approach was developed to support integration of heterogeneous geographic information systems and is based on a common terminology similar to an upper level ontology. The elements of this common terminology are used to describe classes from different ontologies in terms of formal concept expressions. Although the underlying technological principles introduced in OntoBroker as well as BUSTER are closely related to those of HARMONISE, the resulting solutions mainly differ in their dedication where HARMONISE focuses at data exchange between systems without imposing any changes, whereas the former approaches aim at an integration of data sources to provide a common interface for applications. This fact also affects the resulting architectures where OntoBroker and BUSTER implement a rather centralized service in contrast with the distributed mediation approach introduced by HARMONISE.

HARMONIZING E-MARKETS Next Steps In the future we will primarily aim at the reinforcement of the information-level collaboration in the Harmonise network. This includes extensions in the support of different data formats, legacy systems connector, communication infrastructure, and ontology upgrades as well as an enlargement of THN to challenge the harmonization platform to its full potential. Besides XML documents we already work to support Comma Separated Values (CSV) imports as well as direct connections to relational databases. Integrating additional modules supporting the “lift” of legacy data into Semantic Web-compliant formats will further ease new actors’ participation in the network. Currently provided simple data transformation will be extended with an enhanced protocol to support basic operations relevant for replicated data management like inserts, updates, and deletes of entries. Further, it is expected to represent the IMHO ontology in Web Ontology Language (OWL Lite) to achieve better Semantic Web compatibility. Because OWL has been accepted as W3C recommendation for publishing ontologies on the Web, its adoption also by established standards can be expected in the future. Adoption of general concepts from well-established upper level ontologies and the alignment of IMHO concepts with ontologies in related areas (e.g., geographic information systems or transportation) can improve crossdomain interoperability even outside of the tourism area. To foster interoperability between ontologies OWL provides support for expressing semantic relationships between concepts and properties of different ontologies. This includes notions like owl:equivalentTo, owl:subClassOf, or owl:intersectionOf. It is a natural consequence to enhance Harmonise mapping approach to adopt the information captured directly by OWL ontologies and thus further improve the mapping process. Conclusion The described platform provides an easy and effective way for cooperation and for the exchange of information in the tourism domain. The Harmo Suite System package allows to keep local data

217

models and, at the same time, to participate in the electronic marketplace. Initial testing and validation of the proposed solution took place in the HARMONISE project, with positive results. The new Harmo-TEN phase of the project will bring the Harmonise technology into a major market validation and piloting process, and enable the Harmonise community to make concrete investment and deployment decisions. Some “lessons learned” can be drawn based on the experience gained up to now. The most important one is that the application of ontologies in this field shows to be more acceptable than earlier physical level standards because they provide more flexibility. In that sense HARMONISE played a pioneering role in the process of promoting Semantic Web technologies in the tourism domain and fostering application of ontologies within different interest groups and organizations locally as well as at a global level. However, it can be expected, due to the domain fragmentation, that not only one but a number of related networked ontologies will be maintained within the entire eTourism market, together with intelligent mediation and integration services. The Harmonise technology is currently being deployed in various pilot installations to support collaboration between European tourism systems. The typical actors are destination marketing organizations (e.g., Tourespana, MEK Finland, and EnglandNet), tourism portals (VisitEurope.com), and destination management systems (Tiscover). We foresee additional use cases supporting other actors like information brokers or dedicated addedvalue services relying on tourism data. The main benefit of the weakly coupled Harmonise network is that different roles in the information value chain are preserved so that each actor can concentrate entirely on his part of the business, while at the same time participating in an enlarged information network. Beyond this, the emergence of novel intelligent services in the domain can be stimulated by its success. For example, information retrieval based on semantic annotations of data sources (Popov et al., 2004) enhanced with IMHO concepts, and HARMONISE-supported information extraction from trusted sources (THN actors) could represent the next step towards the future eTourism market.

218

DELL’ERBA ET AL. Biographical Notes

Mirella Dell’Erba studied Information Science at the University of Salerno and she graduated in 1984 with a thesis on digital filters for image processing. From 1985 to 1991 she worked as researcher in “Direzione Olivetti Ricerca” where she participated to several European projects on Image Compression and Coding. From 1992 to 2001 she was been employed in Informatica Trentina S.p.A. where she worked mainly for cultural organizations like museums and libraries. In 1998 she became project manager in the tourism project related to the new information system committed to Informatica Trentina by the Azienda di Promozione Turistica (A.P.T.) of Trentino and by the Servizio Turismo and Servizio Statistica of Autonomous Province of Trento. Since the end of 2001 she has been working at ITC-irst with the eCTRL laboratory.

Oliver Fodor studied Computer Science at the University of Technology, Vienna, where he received his Master in 1999. In 2001 he joined the eCommerce and Tourism Research Lab (eCTRL) in Trento, Italy, where he focused on the topics of Semantic Web and interoperability issues. He was responsible for the methodological and implementation aspects of the ontology-based mediation platform. In 2003 he moved to Vienna, where he continues with the research in the area of ontology-supported interoperability between heterogeneous systems at the eCommerce Competence Center (EC3). Oliver participated in several projects in the eTourism domain applying his research results to the tourism practice.

Wolfram Ho¨pken is director of the eTourism Competence Center Austria (ECCA). In this position he is responsible for IT-oriented research and development projects in the area of eTourism. Prior to this position Wolfram worked for over 10 years for Amadeus Global Travel Distribution c/o Amadeus Germany as a software engineer, technology consultant, and project manager. Wolfram received his Ph.D in computer science from the University of Darmstadt.

Hannes Werthner is Professor for Information Systems and e-tourism at the University of Innsbruck, Austria, and founder and president of the eCommerce Competence Center (EC3) in Vienna, Austria. He is also the scientific coordinator of the Austrian Network for etourism (ANET). Hannes was a member of the strategic advisory board for the European research program IST in FP5 (ISTAG), and is the founder of the International Conference on ICT and Tourism (ENTER, founded in 1994). He is Visiting Professor at the University of Surrey and member of the ICMB (International Conference on Mobile Business) Standing Committee. He holds a Master and Ph.D. from the Technical University Vienna. He studied Computer Science at the Technical University of Vienna, was visiting professor at

several universities, had published over 100 papers and books, and was a fellow from the Austrian Schro¨dinger foundation. His research activities cover Decision Support Systems, Simulation, Artificial Intelligence, and Internetbased Information systems, especially in the field of tourism.

References American Hotel & Motel Associaton. (2002). Hospitality industry technology integration standards (HITIS) [Online]. www.hitis.org American National Standards Institute. (2002). American National Standards Institute Accredited Standards Committee X12 standards (ANSI ASC X12) [Online]. www.x12.org Decker, S., Erdmann, M., Fensel, D., & Studer, R. (1999). Ontobroker: Ontology based access to distributed and semi-structured information. In R. Meersman (Ed.), Semantic issues in multimedia systems (pp. 351–369). Boston: Kluwer Academic Publisher. Fodor, O., & Werthner, H. (2004). Harvesting lightweight ontologies out of legacy XML sources. In Proceedings of the First International Conference on Knowledge Engineering and Decision Support (ICKEDS’04), Porto, Portugal. Fodor, O., & Werthner, H. (2005). Harmonise—a step towards an interoperable e-tourism marketplace. International Journal of Electronic Commerce, 9(2), 11–39. Gratzer, M., Werthner, H., & Winiwarter, W. (2004). Electronic business in tourism. International Journal on Electronic Business, 2(5), 450–459. Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220. Ho¨pken, W. (2004). Reference model of an electronic tourism market—version 1.3 [Online]. www.rmsig.de/docu ments/referencemodel.doc Maedche, A., Motik, B., Silva, N., & Volz, R. (2002). MAFRA–A MApping FRAmework for distributed ontologies in the semantic web (pp. 60–68). Workshop on Knowledge Transformation for the Semantic Web (KTSW 2002) at ECAI’2002. Missikoff, M., & Taglino, F. (2002). An ontology approach for business and enterprise modelling with SymOntoX. In Proceedings of AICA Annual Congress. Missikoff, M., & Taglino, F. (2004). An ontology-based platform for semantic interoperability. In S. Staab & R. Studer (Eds.), Handbook on Ontologies (pp. 617–634). New York: Springer-Verlag. Model, M. (1988). The Model-View-Controller (MVC) paradigm user interfaces. In Proceedings of OOPSLA ’88. ACM. Omelayenko, B., & Fensel, D. (2002). Analysis of B2B catalogue integration problems content and document integration. In J. Filipe, B. Sharp, & P, Miranda (Eds.), Enterprise information systems III (pp. 270–277). Dordrecht: Kluwer Academic Publishers.

HARMONIZING E-MARKETS Oracle9i Application Server. (2001). Web services technical white paper. Open Travel Alliance. (2001). www.opentravel.org Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., & Goranov, M. (2003). KIM—semantic annotation platform. In 2nd International Semantic Web Conference (ISWC2003) (LNAI Vol. 2870, pp. 834– 849), October 20–23, Florida, USA. Berlin: SpringerVerlag. Resource Description Framework (RDF) Schema Specification 1.0. (2000). W3C Candidate Recommendation. http://www.w3.org/TR/2000/CR-rdf-schema-20000327 TIN. (1992). Die Touristische Informations-Norm (TIN) fu¨r den deutschen Fremdenverkehr-Empfehlung fu¨r Informations- und Reservierungssysteme. Bonn/Mu¨nchen. UN/EDIFACT. (2002). United Nations directories for elec-

219

tronic data interchange for administration, commerce and transport [Online]. United Nations Economic Commission for Europe. www.unece.org/trade/untdid Visser, U., & Schuster, G. (2002). Finding and integration of information—a practical solution for the semantic web. In J. Euzenat, A. Gomez-Perez, N. Guarino, & H. Stuckenschmidt (Eds.), Proceedings of the ECAI 02, Workshop on Ontologies and Semantic Interoperability (pp. 73–78). Lyon, France: CEUR Workshop Proceedings. Werthner, H., Fodor, O., & Herzog, M. (2004). Web information extraction as basis for smart business networking. Electronic business in tourism. In P. Vervest, E. van Heck, K. Preiss, & L. Pau (Eds.), Smart business networking. Heidelberg: Springer-Verlag.