XML and Databases - wseas

7 downloads 0 Views 128KB Size Report
ensuring the information provided takes into account organizational differences, trading regulations, pricing and legal differences. The Internet is breaking down ...
Data-Driven Web Sites YURI BOREISHA, OKSANA MYRONOVYCH Department of Information Systems Sultan Qaboos University PO Box 20, Postal Code 123, Al Khoud OMAN

Abstract: Traditional Web sites are built from static, dynamic and interactive Web pages. XML content management system provides for development of data-driven Web sites. These Web sites are built from components using client and/or server side composition, providing for transformation, presentation and styling of XML documents. The paper discusses issues related to the development of data-driven Web sites based on integration of database and XML technologies. Key-Words: XML, content management, database, Web site development.

1 Introduction Modern businesses span a variety of languages and cultures. E-transformation presents a tremendous opportunity to empower the entire business to deal directly with customers and suppliers via the Internet. The success of the Internet ultimately depends on open standards and interoperability, yet the pace of innovation is driven by proprietary developments. Standards are the right platform for long-term solutions. At the same time the new technologies should be leveraged when they demonstrate clear advantages [1, 2]. As the Internet becomes more and more clearly commercial, personalization has become the means to refine the end-user experience and deliver targeted information. Personalization is a key contributor to a greater online customer experience. But from a global perspective, localization issue needs attention and has far more impact on the user experience for the majority of users. Localization means translating content into the appropriate language and catering to local character sets and writing systems. It means taking into account perception and culture considerations; ensuring the information provided takes into account organizational differences, trading regulations, pricing and legal differences. The Internet is breaking down geographical barriers. But it is still dominating by English-language content. For a global organization, localization is the biggest single opportunity to differentiate on the Internet [3]. The World Wide Web consortium has developed Extensible Markup Language (XML) to support the building of better information management infrastructures. XML has become an integral part of numerous information technologies. Three broad business applications are driving XML usage:

enterprise application integration; extranet data interchange across platforms between companies; Web services that let applications, clients, and servers on different platforms communicate via the Internet. The main categories of XML users are documentation professionals and researchers; Web developers; database and object-oriented programming (OOP) developers [4]. XML’s growing implementation has raised several key concerns. First, because XML files provide considerable information about a document’s content, they include much more data than HTML files. This can burden a company’s network, processor, and storage infrastructure. Second, in addition to extensively describing a document’s contents, XML files typically appear in plain text. These factors create security problems. Third, many developers may not be cooperating with the effort to transition the Web from HTML to XML via XHTML (Extensible HTML) [5]. The paper discusses issues related to the development of data-driven Web sites based on integration of database and XML technologies [611].

2 Some Problems of Traditional Web Sites Traditional Web site content management is based on SQL databases and personalization engines (e.g., the Fuzzy Group recently launched Feedster (www.feedster.com), a search engine designed to monitor and index the RSS (Really Simple Syndication is a Web content syndication format based on XML)). As far as the skill-set issues, ownership issues, and empowerment issues are

concerned, the interest to XML is growing. The clear ability to separate the tasks of data maintenance from formatting is perfect for matching tasks to skills and enabling different specialists to participate in the Web site development. Tremendous opportunities are available for creating scalable campaigns and exciting content that can be maintained quickly, easily, and cost effectively. XML lays down standards that are tighter and clearer than anything seen on the Internet so far, but it is, by definition, extensible. Though XML is explicitly developed to support innovation and freedom of expression, it is also provides a natural way of establishing data standards across industries and markets without dictating look and feel. Translation from one form of XML to another is fast and simple. XML bridges the gap between applications and content. Application data and traditional content can share a common vocabulary and a common technology base – XML.

2.1 Page Paradigm Problem When you browse a Web site, you perceive it as a set of pages. With dynamic HTML these pages might have application-like functionality built into them, but you still browse a sequence of screens or pages. In the conventional HTML paradigm, authors prepare pages or program ASP (Active Server Pages), JSP (JavaServer Pages) or CGI (Common Gateway Interface) scripts that map closely to pages, perhaps pulling in a set of data from a database on the fly. Databases may be used for repetitive tabular data on the site or to store data submitted via a form, but for everything else, pages are hard-coded. The data owned by any one part of an organization can span many pages, and any one page may incorporate data from many groups. Two things can happen: either the page structure end up being modified to mirror the internal structure of the organization rather than the information needs of users, or the organization has to build processes to handle the matrix of ownership. In the second case, it isn’t always clear who owns the pages, and they may not be maintained properly. Information owners need to be able to maintain their data easily without combing the site for every instance of data relating to their domain and without worrying about formatting issues. Site developers need to be able to set presentation standards that will be consistently applied across the site without worrying about the data. Conflict, inefficiency, and duplication are common problems related to the page-oriented publishing paradigm. In other words, the page paradigm has to be broken [4].

2.2 HTML Problem HTML is great, but when it comes to running a large-scale Web site, it presents a number of problems. One aspect of the problem is that HTML is both too rigid and too flexible. HTML is too flexible in the sense that browsers only loosely enforce the SGML definition of HTML. They tolerate badly written documents and do their best to present them as intended. If the content of a page is to be treated as data with any integrity, this kind of looseness can’t be tolerated. One the other hand, HTML is too rigid in two ways. First, there is no formalized method for extending HTML in specific applications, and in a sense the language already has too many tags. The addition of custom tags by Netscape and Microsoft to achieve specific effects was widely condemned, while the abuse of the existing tags in the service of tightly controlled formatting became the norm. Second, with HTML it is impossible to separate format and content in a meaningful way. It is true that with CSS you can radically alter the way page elements are presented, but a table is still a table, a list is still a list, and a page is still a page [4].

2.3 SQL Problem The answer to the HTML problems, according to some content management suppliers, is to break the site into HTML and ASP/JSP templates with access to relational databases. That way you can isolate look and feel, easily manage the data, and publish far more data. Oracle and Microsoft database servers are fast, scalable and robust, and the data is held in a completely media-neutral format. If the majority of your site is highly consistent in format (for example, you may have thousands of new articles, classified advertisements, or product specifications), and if you don’t have the challenges of providing localized content to a variety of markets, SQL might be the right answer. However, in many cases, much secondary data incorporated in the templates, and localization and maintainability are lost. If you are determined, you can model the structure of most classes of Web content using SQL. However, the more complex the page type, the more tables, keys, and joins it requires, the more performance suffers. In addition, the more difficult the initial design, the more difficult it is to alter later. If you want flexibility of design and ease of reuse, SQL rapidly shows its limitations, and XML shows its strength [4, 10].

3 Data-Driven Web Sites Modern Web applications are typically three-tier distributed applications, consisting of a user interface, business logic and database access. The user interface (client – data manipulation and display) in such an application is often created using XHTML, Dynamic HTML, and XML. The user interface can contain ActiveX controls, client-side scripts, and Java applets. XHTML is the preferred mechanism for representing the user interface in systems where portability is a concern. Because most browsers support XHTML, designing the user interface to be accessed through a Web browser guarantees portability across all browser platforms. The user interface can communicate directly with the middle-tier business logic by using the networking provided automatically by the browser. The middle tier (middle tier server – business logic, data integration and conversion to XML) can access the database to manipulate the data. The middle tier is where the intelligence of the application resides. Based on the business logic, the server communicates with back-end resources and acquires specific data. In multi-tier architectures, Web servers and application servers are increasingly used to build the middle tier. Back-end resources enhance applications by providing data that can be used to dynamically generate Web pages. The data is acquired in two formats and probably from two types of repositories. XML data could come from either an object or a relational database. Typically relational data would be in the proprietary format of the source relational database. Having looked at the problems of HTML and SQL in the context of content management, let’s look at how we can develop a Web site using XML and databases.

3.1 XML and Databases Extracting data and exchanging it with business partners is becoming a big business for database professionals. Database developers discovered that XML provides a way to extract data from a database table or group of tables into a structured text file, which can then be exchanged with other organizations that may have completely different types of database systems. The research field consists of many levels. This includes Web sites using XML to extract data from databases and display it using server-side processing scripts such as JSP, ASP, or Perl to name just a few. It also includes protocol and messaging services that facilitate the exchange of information with business or research partners.

Databases have become a crucial part of distributed applications – programs that divide work across multiple computer systems. Relational databases (e.g., MySQL, Microsoft Access, Oracle, etc.) have many different implementations. A software program (driver) helps programs access a database. Each database implementation requires its own driver and each driver can have a different syntax. To simplify the use of multiple databases, an interface (e.g., ODBC, JDBC, OLE DB) provides uniform access to all databases. Relational database vendors have been among the most active in driving XML support into their systems. XML output from SQL queries presents an option to develop the application layer of an enterprise. It is possible to write SQL queries returning XML result sets and send them to the Web server, transforming them to HTML or a wireless markup language using XSLT (Extensible Stylesheet Language Transformations).

3.2 XML Data and XML Documents There has been a tremendous amount of debate about how XML documents will be formatted and browsed, and an equal amount of discussion about XML as data. These discussions have created the impression that there are two classes of XML applications: document applications, in which XML represents a page and just needs formatting, and data applications in which XML describes a message exchange behind the scenes [4]. The reality is that all XML documents are data, and all XML data messages are documents or document fragments. If one wants to query documents for information contained in them, especially as you develop interactive dynamic HTML applications, the strict markup of XML makes the queries both possible and meaningful in a way that HTML could not support. Application messages and data documents should be treated identically. They can be merged into the same document, one can be transformed into another, and so on.

3.3 XML Approach to Web Site Design When XML is applied properly to Web site design, the approach differs considerably from the traditional template-driven approach. Templates basically consist of fixed (or slowly changing) content with slots to be filled from a regularly updated database or even a live data source. XML/XSL puts control in the hands of the data author. The recursive processing of a data document by a style sheet means that the data structure is the primary driver of the document-assembly process, not a script or a

template. There is no fixed content at all. This is a data-driven site. XML is not a replacement for HTML (or XHTML). HTML is the lingua franca of Internet browsers, and it provides a rich set of formatting possibilities. Wireless Markup Language (WML) is likely to establish itself as the HTML equivalent for phones and pagers. Both WML and HTML are, or are about to become, vocabularies for specific XML applications. Neither is made obsolete by XML; they are both enabled by the underlying XML standards. Turning the XML documents into ones that can be passed to a browser will involve passing them through a style sheet, which acts as a template controlling how the final page is constructed. Breaking the page into a number of documents requires additional functionality, because neither of the two common style sheet technologies (XSL and CSS) supports collating of data from multiple documents. The page composition and reuse is a way to go - pages are built from the reusable components. For example, the design team owns the navigation and page structures, and product managers own the content for each of the products. If each of the product managers is issued a template in which to submit their content in the required format, the maintenance becomes fairly trivial. A useful spin-off from this is the ability to rework the same content for use over several pages. Doing so improves consistency and reduces amount of content that requires management.

3.4 Content Management with XML The enterprise content management strategy should define content requirements for employees, business partners, and public Web sites. At one time, principal content buyers were information professional. Now content acquisition is a critical business strategy and necessarily involves many departments. This team must develop a rigorous methodology for evaluating content quality, vendors and technology solutions. The tools and resources selected actually fulfill the promise of enabling distinct user groups to access and use content they need. It is clear that back-end preparation and structuring of the data are extremely important to the customer. The client needs to be able to sort through overwhelming amounts of information to find what is relevant, authoritative, timely, and actionable to perform his/her work or to share with a team of individuals working in related areas. The user does not particularly care about the source of the information as long as it is trustworthy and accurate, so some answers may come form internal content repositories and others from external services.

To help with obtaining answers and then navigating with the answer sets, information users expect to see increased use of linking technology; taxonomies for indexing and categorization of content; XML tagging of records for platformindependent delivery of content to users for on-site customization or personalization. Taxonomies must be carefully developed and maintained to accurately reflect the substance of content published in various languages to a global audience. Classification schemas or taxonomies will come to be essential for managing structured data as well as unstructured content if users are to successfully search across content repositories. XML tagging that explains both the content and structure of the document has quickly become the standard for enabling data exchange between business systems. In fact, it is now considered as the common denominator format for sending and storing data and information. XML promises greater flexibility for formatting documents being requested and viewed in a Web browser, on platforms such as mobile devices or by other enterprise applications. Published business and news items with XML tags can be transferred to various internal repositories and customized for display in portals, corporate intranets, or electronic collaboration workspaces. It gives buyers of this type of content much more value for their investment, and gives IT departments more ROI in portals or other content distribution destinations. Modern content management system should be an integral part of a data-driven Web site that enables the following: interaction between Web HTML and mobile WML clients; utilization of standard XSLT transform mechanisms; easy integration with future systems in electronic marketplaces through XML; and validation and language mapping capabilities available with XML DTD (Document Type Definition) and XML Schema tools.

4 Conclusion One of the main wins that an XML content management system provides is in the separation of form from content, i.e. the ability to separate the data from the final presentation. Without this clear separation it is difficult to support a large amount of content. Because all parts of the business are constantly changing, content needs to reflect these changes and be updated in a timely fashion. To keep the Web site consistent it is preferable to use the service of a professional design team. This team handles the look and navigation and needs to keep in close touch with the various product groups, marketing, and legal departments, each of which wants to input into the content of the site.

Traditional Web sites are built from static, dynamic and interactive Web pages. XML content management system provides for development of data-driven Web sites. These Web sites are built from components using client and/or server side composition, providing for transformation, presentation and styling of XML documents. XML is a great way to store data in a way your organization can digest and manage. HTML is the way to get that data to your conventional browser users in a compelling and highly visual way. Using the two in tandem can empower your data owners and free your designers to focus on making your site world class, rather than spending their time maintaining other people’s content. It is certain that many more integration techniques will be created as storage options are simplified, as clients demand access to multimedia formats, as bandwidth increases and as users begin to realize a positive impact to their productivity. References: [1] M Stal, Web Services: Beyond ComponentBased Computing, Communications of the ACM, Vol. 45, No. 10, 2002, pp. 71-76. [2] O Myronovych, N Younis, Integrated ELearning, Advances in Communications and Software Technologies, WSEAS Press, 2002, pp. 66-70. [3] A Al-Badi, Y Boreisha, Web Site Globalization: Interface Design Considerations, Proceedings of UKAIS International Conference, April 9-11, 2003, University of Warwick, UK. [4] C White, L Quin, L Burman, Mastering XML, SYBEX, 2001. [5] SJ Vaughan-Nichols, XML Raises Concerns as It Gains Prominence, IEEE Computer, Vol. 36, No. 5, 2003, pp. 14-16. [6] Y Boreisha, Database Integration Over the Web, Proceedings of the International Conference on Internet Computing, IC’02, Vol. III, 2002, June 24-27, Las Vegas, CSREA Press, USA, pp. 1088-1093. [7] Y Boreisha, O Myronovych, Database Access, Integration and Sharing Over the Web, Proceedings of the 12th International Conference on Computer Theory and Applications, ICCTA’2002, August 27-29, Alexandria, Egypt. [8] Y Boreisha, Internet-Based Data Warehousing, Proceedings of SPIE, Vol. 4566, “Internet-Based Enterprise Integration and Management”, October 31-November 1, 2001, Boston Marriott Newton, Massachusetts, USA, pp. 102-108. [9] Y Boreisha, O Myronovych, Component-Level Programming Behind the E-Commerce

Implementation Process, Proceedings of the International Conference on Communication, Computer and Power, February 12-14, 2001, Muscat, Sultanate of Oman, pp. 125-129. [10] Y Boreisha, O Myronovych, E-Commerce with XML/XSL and ASP, Contemporary Business Readings (ABA), 2001 Edition, USA, pp. 230-235. [11] Y Boreisha, O Myronovych, Incremental Data Integration for Enterprise Computing, Contemporary Business Readings (ABA), 2000 Edition, USA, pp. 283-290.