A semantic classification model for e-catalogs - IEEE Xplore

2 downloads 8009 Views 337KB Size Report
School of Computer .... value “Samsung” in manufacturer attribute is a .... eliminate this ambiguity and allow computer programs to .... price'. Notice that the semantic definitions allows to clearly see that class 'High end camera' is orthogonal.
A Semantic Classification Model for e-Catalogs Dongkyu Kim

Sang-goo Lee

Jonghoon Chun

Juhnyoung Lee

Director of R&D Corelogix, Inc., SNU ICT 138-516 Seoul, Korea

School of Computer Science and Engineering Seoul National University Seoul, Korea

President & CEO Corelogix, Inc., Department of Computer Engineering College of Engineering Myongji University Yongin, Kyunggi-Do, Korea

IBM T. J. Watson Research Center Hawthorne, NY, USA

Abstract Electronic catalogs (or e-catalogs) hold information about the goods and services offered or requested by the participants, and consequently, form the basis of an ecommerce transaction. Catalog management is complicated by a number of factors and product classification is at the core of these issues. Classification hierarchy is used for spend analysis, customs regulation, and product identification. Classification is the foundation on which product databases are designed, and plays a central role in almost all aspects of management and use of product information. However, product classification has received little formal treatment in terms of underlying model, operations, and semantics. We believe that the lack of a logical model for classification introduces a number of problems not only for the classification itself but also for the product database in general. In this paper, we try to understand what it means to classify products and present how best to represent classification schemes so as to capture the semantics behind the classifications and facilitate mappings between them. We believe the model proposed in this paper satisfies the requirements and challenges that have been raised by previous works

1. Introduction1 1

This work has been conducted in part under the Joint Study Agreement between IBM T. J. Watson Research Center, USA, and the Center for e-Business Technology, Seoul National University, Korea. D. Kim and S. Lee’s work was supported by Ministry of Information & Communications, Korea, under the Information Technology Research Center (ITRC) Support Program.

Conducting business functions over the Internet is no longer an academic concept but a common practice nowadays. The exciting environment of electronic commerce (or e-commerce, for short) encompasses a broad range of interaction processes among the various market participants; order, transport and delivery, mediation, invoice and payment, etc. A typical ecommerce transaction consists of suppliers and consumers gathering information and exploring potential market partners for goods and services, exchanging bids and offers, movement of goods, payments, and post transaction activities. Electronic catalogs (or e-catalogs) hold information about the goods and services offered or requested by the participants, and the computerized nature of e-catalogs allows for opportunities to automate and streamline many of the processes involved. Consequently, e-catalogs form the basis of an e-commerce transaction and catalog (or product information) management is a function that is required in almost all e-commerce systems; e-procurement, e-marketplace, supply chain management (SCM), enterprise resource planning (ERP), product lifecycle management (PLM), etc. Catalog management is complicated by a number of factors. One of these factors is the diversity of schemas for products. A typical catalog containing 100,000 products may contain thousands of different schemas [1]. Another factor is the diversity in users’ views. A manufacturer will tend to view products in terms of the materials and processes associated with the manufacturing of the product, while a distributor would be more interested in the product’s size and weight. Difference in views can be

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

clearly witnessed in the variety of product classification schemes that are in use nowadays.

1.1 Product Classification Product classification is at the core of these issues. People think of products in terms of classes (or types) of products and pose queries based on them. Classification hierarchy is used for spend analysis, customs regulation, and product identification. Classification is the foundation on which product databases are designed, and plays a central role in almost all aspects of management and use of product information. However, product classification has received little formal treatment in terms of underlying model, operations, and semantics. Fensel and Omelayenko [2] present the issues of B2B integration, focusing on product information. They list the difficult aspects of building, maintaining, and integrating product information. In [3], the specific issue of product classification is addressed by showing, quite effectively, the difficulty in synchronization between two separate classification schemes. They even solicit proposals for a model to map different classification schemes with minimal information loss. [4] presents a practical model for eCl@ss and touches upon the practical issues for classification schemes. Agrawal & Srikant [5] presents how to classify a set of products given an existing classification. These previous works have elegantly addressed various issues of product information management. However, the underlying model for product classification was assumed to be the simple code-based hierarchical model of UNSPSC, eCl@ss, and HS. In this paper, we try to understand what it means to classify products and present how best to represent classification schemes so as to capture the semantics behind the classifications and facilitate mappings between them. We believe the model proposed in this paper satisfies the requirements in [4] and is one solution to the challenge raised by [3]. There are a number of established classification schemes that are in use today. The Harmonized Commodity Coding System [6], UNSPSC [7], eCl@ss [8] are important classification standards that cover all industry sectors. There are also numerous industry specific product classification standards such as IEC61630 (electronics), UNCPC (transportation), SITC (international trade), CPV (procurement contract), etc. These standards have differing objectives and criteria in classifying the products of interest. However, interestingly, these schemes all adopt the simple code based model. That is, they assume the simple hierarchical tree using “classification codes” where the code of a superclass (say, 4010) is a prefix of the codes of its subclasses (401011, 401012, etc.). The subtle constraints and conditions are

mostly implicit or at best specified at various places of the guidelines for human reading. We argue that the lack of a logical model for classification introduces a number of problems not only for the classification itself but also for the product database in general.

1.2 Observations We first list a set of observations that form the basis of our discussion. (1) A product is an entity with a set of properties. (2) A product’s identity does not depend on how the product is classified. A pack of white sheet A4 paper preserves its identity whether it is classified as “typewriter paper” (24-26-03-01 in eCl@ss), “fax paper” (24-26-04-01), or “copy paper” (24-26-0601). Failing to acknowledge this somewhat obvious observation leads to widespread confusion in many of the commercial product databases. (3) Product classification serves multiple purposes. a) Spend analysis: An important use of classification is to facilitate aggregation. A predefined set of classes and their hierarchy can be the basis for drill down and roll up in analytic processing. It is important in decision support systems to be able to analyze data in multiple dimensions. However, a code-based classification only provides one dimension of view, and thus, cannot alone support the requirements of analytic processing. b) Regulations: The HS code is used for international customs regulation. There are also local standards for distribution control, maintenance and management, etc. For such applications, it is important that classes must be exclusive and unambiguous. c) Product search: Most search queries on product databases are in the form of keyword list. The list usually contains a product class (type) with select conditions on one or more attributes. d) Auto classification: The need for autoclassification is well documented in [5]. There are situations where thousands of new products must be categorized into a given classification scheme; mergers between marketplaces, addition of a new supplier with thousands of new products categorized in a different way. Manual classification will not scale to the task. e) Product database design: Product databases are designed based on classification schemes. [10] (4) A classification is a representation of just one single view of the product data space. There is no way a single classification can serve all purposes. For example, a classification that is fit for spend analysis

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

for a company should represent the view of the executives. However, this view may not be the best view for database design where sharing of common attribute sets should the most important classification criteria. Even for spend analysis, different departments could be interested in different groupings of items and/or the interest of the executives change over time. It is simply impossible to build (or to find) a single “super” classification that would support all these views. Too companies are still pouring resources into trying to accomplish this impossible task rebuilding their databases every few years. (5) Most attribute values are inherently hierarchical. The value “Samsung” in manufacturer attribute is a member of “semi-conductor manufacturers” which in turn is a subclass of “IT companies”. The value “aluminum” in material attribute is under “special metals” which in turn is a subclass of “metals”. This observation re-emphasizes our previous observation that there are dozens of hierarchical views that must be supported in a product database at any given time. (6) A product class is both a concept and a set (of products). “Flat panel TV” may be defined by the properties that must be present in the members of the class. Or the class can be characterized by enumerating the members without reference to the properties.

1.3 Paper Organization The rest of the paper is organized as follows. In section 2, we introduce the types of classification models by listing the different aspects of a model construction. Our semantic classification model is presented in section 3. In section 4, operations are defined for the model. The paper is concluded in section 5..

of class is informal (in the form of human readable description). One of the problems of current schemes is that the underlying model is implicit. For example, code based schemes cannot support multiple inheritance because the code (identifier) of a class is the concatenation of identifiers of its ancestors in the hierarchy. If multiple inheritance were allowed, a class would have more than one identifier. Thus, the underlying tree model is implicitly implied by the representation of the class identifier. The choice of model should dictate the choice of representation and not the other way around. We present in this section a number of aspects of a classification model.

2.1 Tree vs Graph A classification scheme may be modeled either as a tree or a graph. In a tree model, a class may have only one parent, while, in a graph model, multiple inheritance is allowed. All code-based classification schemes are inherently trees since a class code cannot have two different prefixes of the same length. However, the graph model with multiple inheritance is more flexible and, in our opinion, more natural. For example, consider the portion of the code-based UNSPSC in figure 2-1. In the first diagram, 14-11-15-11 ‘Writing paper’ is under 1411-15 ‘Printing and writing paper’, but semantically it can go under ‘Personal paper products’ or ‘Business use paper’, or even ‘Office supplies’ (actually, in eCl@ss, ‘Writing paper’ is under ‘Office supplies’). Because multiple parents were not allowed, the scheme could not represent the additional relationships. 44

14

Office equipment and accessories and supplies

Paper materials and products

11

12

Paper products

2. Product Classification Model

Office supplies

15

18

Printing and writing paper

Before we go on, our use of the terms classification, classification scheme and classification model should be clarified. UNSPSC and eCl@ss are each an instance of catalog classification that define a set of classes and their hierarchical relationships. We shall call such an instance a classification or a classification scheme. A classification model is the underlying data model of the classification schema. It defines such issues as whether the relationships form a tree or a graph, whether class definitions are formal, and whether products can belong to multiple leaf level classes. For example, UNSPSC and eCl@ss share the same model; uses the tree topology and each product can be assigned to only one leaf class, and the description

17

Business use paper

Personal paper products

11 Writing paper

25 Commercial and Military and Private Vehicles and their Accessories and Components

17 Transportation components and systems

21 Vehicle safety systems and components

01 Airbags

45 Vehicle safety and security systems

02 Air bags

Figure 2-1. Examples from UNSPSC Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

The second diagram shows two classes, 25-17-21-01 ‘Airbags’ and 25-17-45-02 ‘Air bags’, that are actually the same. Again, because multiple parents are not allowed, the same class has been duplicated and appears redundantly. Under the code-based model, the redundant classes are treated as completely different classes, and if a buyer searches for airbags in one class, there is no systematic way of knowing there is an identical class in a different location.

2.2 Formal Semantics vs Informal Semantics The description of a class makes up the semantics of the class. It is interpreted and used to decide the members of the class. Class codes used in current schemes convey no semantics but are merely a means of representing parent-child relationships. The semantics of a class in these schemes is represented in the form of its class name and natural language descriptions. Such representation is inherently ambiguous and difficult to use in a computer program. By having formal semantics for classes, we can eliminate this ambiguity and allow computer programs to operate directly on the semantics. The formal semantics can be used as the aggregating conditions in spend analysis, and the reduced ambiguity will enhance the effectiveness in product search and identification (for regulations). Auto classification and database design can only benefit from the added information provided by formal semantics.

classification schemes have leaf nodes at the same level. This is preferable for consistency and ease of use. However, in situations where certain part of the hierarchy needs to be defined in more detail than others, this restriction becomes a burden. The designer should weigh the options depending on the application semantics, policies, and other domain specific factors. These last two aspects are not fundamental components of a classification model but rather can be treated as specific conditions separately enforceable on individual classification schemes. We summarize in table 2-1 the types of classification models discussed thus far. In the next section, we present the Semantic Classification Model which is graph based and adopts formal semantics with attribute definitions for each class. Table 2-1. Types of Classification Models Aspect

Options

Fundamental Remarks Component Topology Tree vs Graph Yes Graph preferred. Trees may (DAG) be required but difficult to build a truly consistent tree with no overlaps. Semantics Formal vs Yes Formal specification is Informal required. Attributes Required or not Yes Required for formal specification and enrichment. Leaf level Same level or no No Depends on application. restriction Product Single leaf class No Depends on application. membership only or no restriction

2.3 Attributes

3. The Semantic Classification Model

Attribute definitions can be a required component of a class definition. [4] The set of products in a class share a common set of properties, and the properties are manifested in attributes of products (observation (1) of section 1.2). It is impossible to talk about properties of products without reference to the attributes, either directly or indirectly. So, defining the basic set of attributes for classes is necessary and important. Most of the current classification schemes do not have attributes. eCl@ss has attribute definitions for the leaf classes and there are works in progress to assign attributes to UNSPSC classes as well.

A (product) class is a set of products. Taking S as the set of all products in consideration, a class is a subset of S. A class can be defined by enumerating its members. However, for the purpose of defining and managing classification schemes, enumeration cannot be the main option since classes would have to be revised constantly as instances of products enter or leave the system. Instead we use class definitions that define the intensions (as in logic databases [11]) by specifying the properties of the members of the given class. Examples of class definition are class names, short descriptions in English, predicate logic formulae, and database queries that can retrieve the members of classes. It is important that class definitions should be (1) unambiguous in the sense that the same set of products must be identified for each class regardless of who interprets the definition. Furthermore, the definition should be (2) machine-understandable to facilitate consistency checking, enhance portability, and scalability.

2.4 Others A classification scheme may allow or disallow products to be assigned to more than one leaf classes. This would depend on a number of factors such as the objective of the classification. Most of the current

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

The current classification schemes such as UNSPSC fail in each of these categories. Class codes merely identify the locations of classes within the hierarchy and have no information value in terms of the semantics of the classes. Class names and natural language descriptions are for human eyes only and are, more often than not, ambiguous. We propose a Semantic Classification Model (SCM) that augments a classification hierarchy with formal semantics of each class. SCM is our extended classification model for modeling a set of products, classes and their relationships. In SCM, there exists a set of attributes, such as product number, title, manufacturer, descriptions, and prices for each product, and those attributes are used for description of each product. A class is defined as a set of products that share a common property represented as a propositional condition over the attributes of the products. A class may have one or more subclasses whose conditions are more specific than its parent class, and the set of classes forms a directed acyclic graph (DAG). In this model, a classification scheme is not merely a mechanical assignment of class codes but a semantic representation of product set S.

1) M0=TRUE (satisfied by all products; a corollary to the definition of C0) 2) E (the parent-child relationship) does not form a cycle 3) If < Ci, Cj >∈E (Ci is the parent of Cj) then Ai⊆Aj, i.e., (Attribute inheritance constraint) 4) If < Ci, Cj >∈E then Mj implies Mi, i.e., Cj ⊆Ci (Soundness constraint)

00

11

The set of integrity constraints IC can be different from one classification scheme to another. One scheme may allow multiple inheritance while another may not. However, for a classification scheme to have practical utility, we require the following set of basic integrity constraints to hold: 2 Di is optional in the sense that such descriptions do not add to the theoretical structure of this definition. However, we include them for practical reasons; they are artifacts that provide a natural link to the classification schemes we are use to seeing, and also they make our examples more legible. 3 Since the membership function of a class identifies its members, we will use treat the definition of a set interchangeably with the set itself.

D1 = Camera A1 = A0 U { lens, dimension, sensor } M1 = M0 ^ type=‘camera’

D2 = Film camera A2 = A1 U {filmsize, loading, filmspeed } M2 = M1 ^ sensor=‘film’

22

D4 = High end camera A 4 = A1 M4 = M1 ^ retailprice>1000

33

44

D3 = Digital camera A3 = A1 U {resolution, digital zoom, memory, USB } M3 = M1 ^ sensor=‘digital’

55

3.1 Definitions Definition 3.1 A semantic classification scheme CS over a set of products S is a 5-tuple < S, C, C0, E, IC >, where S is the set of all products (in consideration), C = {C0, C1, …, Cn} is the set of classes, where each Ci is a 3-tuple , where Di is the text description of each class such as class name and short sentence description2, Ai is the set of attributes common to all members of Ci, and Mi is the membership function defined as a well formed logic formula over Ai.3 C0∈C is the root class that includes all members of S, E ⊆ C×C is the set of edges representing parent-child relationships among classes in C. IC is the set of integrity constraints defined on the scheme (discussed further below). ■

D0 = OurProduct A0 = {prod_nm, type, brand, retail price, sale price } M0 = TRUE

D5 = High end digital camera A 5 = A3 U A 4 M5 = M3 ^ M4

Figure 3-1. A semantic classification scheme Example 3.1 A semantic classification schema is shown in figure 3-1. Class ‘Camera’ is further classified into classes ‘Film camera’ and ‘Digital camera’ based on the value of ‘sensor’ attribute. Each of the two subclasses has its own set of attributes in addition to the ones inherited from the parent. Class ‘High end camera’ is also a subclass of ‘camera’ categorized on the attribute ‘retail price’. Notice that the semantic definitions allows to clearly see that class ‘High end camera’ is orthogonal (different classification criteria) to the other two sibling classes. ‘High end digital camera’ is a subclass of both ‘Digital camera’ and ‘High end camera’. The attributes of both classes are inherited and the membership function is a conjunction of those of its parents. ■ The semantic definition of a class Ci identifies the set of products in the class. Whether to keep pointers to product instances for each class or whether to have instances have pointers to leaf classes are purely implementation issues. SCM allows the classification designer to focus on the semantic contents of a classification by abstracting such details from the model. The following example shows that SCM is capable of representing the current code-based classification schemes.

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

A classification schema is semantically sound if each topological relationship implies semantic relationship, and semantically complete if each semantic relationship implies topological relationship. ■

It also clearly shows the lack of semantics in these schemes. Example 3.2 The two classes ‘Office supplies’ and ‘Writing paper’ (shown in figure 2-1) of UNSPSC can be defined as follows;

00

COff supp = < D Off supp = “Office supplies”, A Off supp = Ø, M Off supp = (class_code = 4412*) >

11

Cwr paper = < D wr paper = “Writing paper”, A wr paper = Ø M wr paper = (class_code = 14111511) > In the current version of UNSPSC, there are no explicitly defined attributes or membership predicates. Implicitly, it is assumed that each product will have a ‘class code’ attribute to hold the identifier of the class it belongs to. So, we must rely on this attribute to build membership functions for classes as shown in this example. These codes cannot tell the system that there may be a superset-subset relationship between class 14111511 (‘Writing paper’) and class 4412 (‘Office supplies’). The only other clue for understanding the semantics of each class is its class name which, obviously, is inherently ambiguous. ■

3.2 Properties A Classification scheme can be represented as a graph where the nodes represent classes (C) and the edges represent parent-child relationships (E). We define two kinds of formal relationships in a semantic classification scheme Definition 3.2 Topological Relationship, ≥T Ci ≥T Cj if Ci is an ancestor of Cj in G Semantic Relationship, ≥S Ci ≥S Cj if Mj implies of Mi



Topological relationships are those explicitly expressed in a scheme in terms of paths between classes. They represent the explicit intention of the designer, similar to the class codes in code-based classification schemes. Semantic relationships, on the other hand, are subsumption relationships that hold between classes whether by designer’s intention or not. If the classification scheme were defined correctly, each pair of classes with topological relationship should also have semantic relationship. We define this notion formally. Definition 3.3 Soundness and Completeness

D0 = OurProduct A0 = {prod_nm, type, brand, retail price, sale price } M0 = TRUE

D2 = Film camera A2 = A1 U {filmsize, loading, filmspeed } M2 = M1 ^ sensor=‘film’

22

D1 = Camera A1 = A0 U { lens, dimension, sensor } M1 = M0 ^ type=‘camera’

D4 = Highend camera A4 = A1 M4 = M1 ^ retailprice>1000

attributes

33

44

D3 = Digital camera A3 = A1 U {resolution, digital zoom, memory, USB } M3 = M1 ^ sensor=‘digital’

No topological relationship specified, but semantic relationship still holds

55 D5 = Highend digital camera A5 = A4 U {resolution, digital zoom} M5 = M4 ^ sensor=‘digital’

Figure 3-2. A semantically sound but incomplete schema

Example 3-3 Let us use the classification scheme of example 3-1. But, this time, suppose the designer was not aware of class 3 (‘digital camera’) when defining class 5 (‘High end digital camera’). The new class is defined as a subclass of class 4 (‘High end camera’) with a condition on attribute ‘sensor’, as shown in figure 3-2. A quick inference shows that M5 implies of M3, and thus, C3≥SC5. Since ¬C3≥TC5, the scheme is not semantically complete. However, since all topological relationships imply semantic ones, it is sound. ■ Notice, in the above example, that A5 does not have attributes ‘memory’ and ‘USB’ which are present in A3. With SCM, the system can run consistency checks and notify the designer of such situations. The designer may then choose to fix the situation depending on the constraints she is enforcing on the scheme. SCM supports product identity independent of classification. The member function of a class specifies the property that members of the class must have. If, for some reason, the property of a product changes over time, the product may change its membership of the class over time. The identity of the product is preserved throughout this process. SCM allows the specification of differing views, as shown in example 3-1. The class definitions may constantly be revised to accommodate the changing views of an executive or certain regulations without reassigning codes and/or reorganizing the database. This is possible by abstracting the implementation details from the class

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

definition and also by separating the product identification problem from the classification problem. For certain applications, it is important for products to belong to exactly one leaf class. By specifying and enforcing appropriate integrity constraints this can be supported within SCM. How those constraints will be enforced is beyond the scope of this paper. Additionally, SCM is well suited for resolving ecatalog integration issues. Given two classification schemes CL1 and CL2 in this model, mapping between classes C1i of CL1 and C2j of CL2 can be specified precisely. If the two schemes share the same set of vocabulary (attribute names, values, etc.), the mapping becomes the process of finding common subexpressions in M1i and M2j. In practice, even when the schemes do not share the common vocabulary, the model can be used to further qualify the simple correspondences defined in a crosswalk table. An example is shown in figure 3-3. Classification Scheme CL1 01 0101

0102

0103

which is a process deciding existence of semantic relationship. Insert_CLASS (Cp, Cnew): When a class Cnew is inserted as a subclass of class Cp, membership predicate of Cp needs to be changed to include products of Cnew. Delete_CLASS (Cdel): When a class Cdel is deleted, it decides whether child class of Cdel is going to be deleted or maintained to reset topological relationship of Cdel’s parent class. It is not necessary to edit the membership predicate, because semantic relationship is not changed in either case. Split_CLASS (Csplit, CPs): It is slicing product of the class in two classes having different semantic based on exclusive classification predicate. If the class has child classes before the split, then split_CLASS is applied to them recursively. Merge_CLASS (Ci, Cj): Merging classes is uniting products of two groups having different semantic and belonging to two different classes, and it is accompanied. In other words, it is union of membership predicate of the merged two classes. However, membership predicate is not changed because merging two parent classes does not effect on semantic relationship of each child class.

Classification Scheme CL2 03 03-A

03-B

5. Conclusion

Class 0102 is mapped to (Class 03-A ^ Attr1=“a”) U (Class 03-B ^ Attr2=“b”)

Figure 3-3. Correspondence mapping with qualification

4. Operations insert and delete are employed to process the manipulation of SCM. Additionally there are more complicated operations, they are split and merge. Their role is analytic processing such as drill-down and roll-up from existing classes. However, differently from the general code change or graph manipulating operation, semantic relationship in classification scheme should be conserved for the soundness of classification scheme, and the consistency of membership predicate should be preserved for avoiding the damage of semantic of products in each operation process. Compare_CLASS (Ci, Cj): It is the process of finding common subexpressions in Mi and Mj. That is deciding a semantic relationship among cover, overlap, subsumed, disjoint between two classes,

The limitations of the current classification schemes have been presented as motivation to our work. We have analyzed the different aspects of a classification model and identified the requirements of a robust model. We have presented a semantic model that can be used as a foundation on which to build classification schemes. The model allows for multiple inheritance, formal semantic representation, and attribute definitions for classes. A set of properties that must be satisfied by all classification schemes have been defined as the basic integrity constraint. Individual schemes may define specific restrictions and conditions as additional integrity constraints. Because the model does not depend on codes or any other implementation dependant features, it is applicable in most environments. Classification schema manipulation operations have been defined with respect to the semantics of the model. Multiple views are easily supported since the classification criteria are explicitly represented formally. We believe this model is especially important in standardization efforts. With the precise semantic definitions for product classes, there would be little room for deviations due to individual interpretations of the standards. Classifications defined on this model will allow for better integration by enabling semantic mappings

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE

between classification schemes. These two applications are extremely important in B2B integration and interoperability. The model may be a little complicated and will take time and effort to create a semantic classification based upon the model. However, the losses that can result from ad hoc models far outweigh this cost. The various guidelines and rules that are used in product classification organizations can be organized into the model in a coherent way. We have implemented a catalog management system based on our model [9]. Also, the model is being used as a guideline for standardizing industry specific product information in the B2B Pilot Project run by the Korean government. The Project has 40 consortiums representing their respective industry sectors and has been running for 4 years. Our model has been selected as the standard model for product information in October 2003, and we expect to see the first results by May of 2004. Extending the model to incorporate ontological concepts is an important and interesting future work. The Public Procurement Service of Korea has also adopted our model as the foundation for its Standard Product Ontology. We are currently analyzing ontological issues in ecatalogs and expect to define the requirements of a product ontology.

6. References [1] Anant Jhingran, “Moving up the food chain: Supporting ECommerce Applications on Databases.” SIGMOD Record 29(4), 2000 [2] Fensel, Omelayenko, Ding, Schulten, Botquin, Brown, Flett, “A Product Data Integration in B2B Electronic Commerce,” IEEE Intelligence System, 16(3), 2001

[3] Ellen Schulten, Hans Akkermans, Guy Botquin, Martin Dörr, Nicola Guarino, Nelson Lopes, Norman Sadeh, “The ECommerce Product Classification Challenge,” IEEE Intelligence System, 16(4), 2001 [4] Jörg Leukel, Volker Schmitz, Frank-Dieter Dorloff, “A Modeling Approach for Product Classification Systems,” proc. of DEXA Workshops, 2002 [5] Rakesh Agrawal, Ramakrishnan Srikant, “On integrating catalogs,” proc. of WWW, 2001 [6] Harmonized system committee, “Harmonized System Convention,” available at http://www.wcoomd.org/ie/En/Topics_Issues/HarmonizedSyste m/harmonizedsystem.htm, 2004 [7] UNDP, “United Nations Standard Products and Service Code, White papr,” available at http://www.unspsc.org/, 2001 [8] Cologne Institute for Business Research, “eCl@ss - New Standardized Material and Service Classification,” available at http://www.eclass-online.com/, 2004 [9] Dongkyu Kim, Sang-goo Lee, Jonghoon Chun, Sangwook Park, Jaeyoung Oh, “Catalog Management in E-Commerce Systems,” proc. of Computer Science & Technology, 2003 [10] Kiryoong Kim, Dongkyu Kim, Jeuk Kim, Sangwook Park, Ighoon Lee, Sang-goo Lee, Jong-hoon Chun, “An Evaluation of Dynamic Electronic Catalog Models in Relational Database Systems,” Managing E-Commerce and Mobile Computing Technologies, IRM press, 73-90, 2003 [11] Herve Gallaire, Jack Minker, Logic and Data Bases, Perseus Publishing, 1978

Proceedings of the IEEE International Conference on E-Commerce Technology 0-7695-2098-7/04 $20.00 © 2004 IEEE