Universal Access Architecture for Digital Libraries - CiteSeerX

9 downloads 25691 Views 389KB Size Report
ture of a mobile software client that benefits from these mechanisms is described ... with a data server that has not been designed to support client mobility via ...
Universal Access Architecture for Digital Libraries∗ Francisco Alvarez-Cavazos, Roberto Garcia-Sanchez, David Garza-Salazar, Juan C. Lavariega, Lorena G. Gomez, Martha Sordia Informatics Research Center, ITESM, Campus Monterrey Av. Eugenio Garza Sada 2501, Monterrey, NL, 64849, Mexico {franalvac,rgsipn}@gmail.com; {dgarza,lavariega,lgomez,msordia}@itesm.mx

Abstract In this paper we present a universal access architecture for digital libraries. Our architecture supports traditional fixed clients and mobile clients addressing the connection adaptation and limited resources challenges presented by mobile devices. We describe the requirements of universally available personal digital libraries and illustrate their applicability with a user scenario. These requirements are addressed by our universal access architecture, which targets to support multiple device access, including mobile devices. The main components of the architecture are the Client-Side Applications, the Data Server and the Mobile Communication Middleware (MCM). Our work has focused on the mobile connection support provided by the interaction of mobile clients with the MCM, obtaining a constant response rate in spite of variability of network conditions. The architecture of a mobile software client that benefits from these mechanisms is described and supplemented with implementation notes showing how—in spite of the limited computing resources of mobile devices—it can interact with a data server that has not been designed to support client mobility via adaptation techniques implemented in a middleware.

1

Introduction

With the advent of wireless communications, users can stay connected to the network even when they are moving from one place to another. This paradigm is called mobile or nomadic computing. Mobile computing is changing the way people interact, work and use technologies. This has brought ∗ This work has been funded by CONACyT (grant CONACYT-2002-C01-41372), CUDI and ITESM. c 2005 ITESM, Campus Monterrey. PermisCopyright sion to copy is hereby granted provided the original copyright notice is reproduced in copies made.

on myriad opportunities for exciting work in systems that currently envision users as being static, such as [1]: (a) workflow management systems, (b) digital libraries, (c) e-commerce and (d) even the WWW itself! The motivation for our research lies with digital libraries, which provide online access to distributed text and multimedia information sources in an integrated manner. Digital library services have traditionally been provided via a Web-based search interface. However, with the widespread-use of mobile technologies, massive digital library systems with tens of millions users will appear in the near future. The scale of such systems demands changes to the traditional architecture of digital libraries. To grant digital library services to the mobile user, we propose a universal access architecture which aims to provide users with the capability of accessing the digital library “from anyplace at anytime” [2]. By its very nature, universal access encourages a user-centered design of applications. Thus, we have extended our digital library motivation to realize the abstraction of personal libraries. Personal libraries provide traditional library services such as document submission, fulltext and metadata indexing and document search and retrieval; augmented with innovative services for the moment to moment information management needs of the individual user. These innovations include provisions to customize the classification of documents, interact with other digital libraries and support user-to-user exchange of generic digital content. Hence, the universal access architecture presented herein should be suitable for other document-based applications on which device-independent access to both personal and collective data is required. To realize universally available personal library

Connection Adaptation Low bandwidth Frequent disconnections High bandwidth variability Predictable disconnections Communication asymmetry Mobility

services we must cope with the technological challenges imposed by the implementation of digital library services [3]: (a) digital document creation, (b) classification and indexing, (c) search and retrieval, (d) distribution and content delivery, (e) administration and access control, (f) non-restrictive documents, (g) quality of documents, (h) variety of formats and (i) user interface and presentation; the mobile environment [1] and the specific requirements of personal digital libraries (see Section 2). The main challenges imposed by mobile environments, presented in Table 1, include:

Fast changing locations

Limited Resources Limited battery power Limited processing power Limited memory Small screen size Scalability Security Susceptible to malicious attacks Susceptible to damaging data due to theft and accidents

Table 1: Mobile Environment Challenges in the context of our research testbed: the Personal Digital Library System (PDLib for short) [4]. We address the (a) low-bandwidth, (b) frequent disconnections and (c) high bandwidth variability connection adaptation and the (a) limited processing power, (b) limited memory and (c) scalability limited resources challenges of the mobile environment with the integration of information storage and retrieval, database and mobile client technologies to provide universally available personal digital libraries. The rest of this paper is organized as follows. In Section 2 we describe the concept of universal access. We exemplify universal access in the context of a library application: PDLib. Based on this discussion, Section 3 describes the universal access architecture in terms of PDLib’s implementation. In Section 4, we compare our universal access architecture with other work in the field. Finally, we conclude in Section 5.

• Connection adaptation. Mobile applications require a connection adaptation strategy against low bandwidth availability, disconnections, high bandwidth variability and communication asymmetry (i.e. different bandwidth in the downstream and upstream directions); • Limited resources. Mobile systems could face a significant mismatch between: (a) the supply and demand of a resource such as energy, processing power, memory, small screen size, etc. in mobile units and (b) processing power or network bandwidth in the base stations, fixed hosts and database servers in order to support a high volume of concurrent access from the mobile units; • Mobility. The location of a mobile unit is of fundamental importance to mobile data management since it should be regarded as a data item whose value changes with the unit’s every move; and

2

• Security. Wireless mobile environments are even more susceptible to malicious attacks than wired distributed systems. In addition, the size of some of the portable units (e.g. handhelds) makes them prone to data loss due to theft and accidents.

Universally Available Digital Libraries

In order to provide universal access, a system must be designed for mobile environments. This means that the system must contemplate several client application types. Software clients can be classified according to their client-side architecture into: (a) thin clients, and (b) thick clients; and agreeing with their mobility into: (a) fixed clients, and (b) mobile clients (Figure 1). In thin clients, the application is delivered on a browser or microbrowser; while on thick clients both code and data reside on the device. Thick and thin clients can coexist in a single device. Mobile clients are those with wireless connections to the system’s network; while fixed clients have a reliable—wired connection—with the network. Figure 1 plots client-side architecture on the ver-

The mobile environment challenges must be addressed by a universally available digital library system. Perhaps the most relevant of these challenges, from the digital library system point of view, are connection adaptation and limited resources, since they directly affect the communication services that a digital library system must provide. In this paper, we present our universal access architecture and its proof-of-concept implementation 2

Figure 3: PDLib Web Client

Figure 1: Mobile Environment Clients

be provided as a mechanism for document classification and users should be provided with the ability to define the metadata set that will be used to describe the contents of each collection. (b) Digital document submission. The user should be able to add any digital document to a personal digital library. Submission from several device types should be supported. The personal digital library must be able to accommodate several document formats for the same document. (c) Search and retrieval. Search and retrieval mechanisms must adapt to the personalized classification schema defined by the users of the system. (d) Administration and access control. The owner of a personal digital library always has unrestricted access to her/his personal digital library content. In addition, the owner should be provided with the capability to provide other users with access to her/his personal digital library content. (e) Interoperability. Interoperability with other personal library users and with other search and retrieval systems using well-known interoperability protocols. These requirements define personal digital libraries as composed of collections. Collections contain, in turn, other collections and/or documents. Users can interact with personal digital libraries by creating and deleting collections and submitting, moving, copying or downloading documents. In addition, users can define the metadata set that will be used to describe the contents of each collection. These interactions allow the user to customize her/his personal library as desired. PDLib has been designed to provide its users with universal access to personal library content and, in consequence, targets mobile and fixed devices adapting library services to client device capabilities. Figure 2 shows a CLDC/MIDP J2ME

tical axis and client mobility on the horizontal axis. The quadrants contain stereotypical protocols and development platforms for each client category. Mobile thick clients are especially important to achieve universal access since mobile applications should center on mobile thick clients—a key difference with Web applications, which typically focus on fixed thin clients—since thick clients make offline operation possible. A technological migration trend that goes from Web applications to mobile applications is suggested. This trend reflects recent software application development tendencies and implies that the technologies available in an earlier stage are present in subsequent stages as well.

Personal Libraries To realize personal digital libraries, the following requirements were considered: (a) Flexible collection and metadata management. Collections must

Figure 2: PDLib Mobile Client: (a) Library Navigation, (b) Document Services. 3

implementation of a mobile thick client running in a Palm Tungsten C Simulator. Figure 3 shows a fixed thin client running on a desktop computer’s Web browser. To illustrate the concept of universally available personal digital libraries, we provide a hypothetical user scenario. The following scenario exemplifies user interaction with personal libraries, emphasizing the mobile connection support mechanisms already available in PDLib.

Sarah, Aidan bought himself a PDA. Having written the joint essay and stored a copy of both the finished essay and its related presentation in their PDLibs, Sarah and Aidan decide to retrieve the presentation using PDLib in preparation for their in-class lecture. Later, at the lecture’s Q&A section, both Sarah and Aidan were able to access their PDLibs to search, retrieve and display some of the documents quoted in their work from Aidan’s PDA. After being praised for the quality of their presentation, Sarah and Aidan were able to send the essay via PDLib to their professor and classmates. Shortly after learning about Sarah and Aidan’s success, Matthew acquired a PDA and has obtained a PDLib. Having spent some time last night creating and populating several collections in his PDLib from his home desktop computer, he uses his PDA to access his PDLib through the campus wireless network. While walking across campus as he experiments with his PDLib, he discovers that he can navigate smoothly through his library despite connection variability. Matthew is pleased to find out that PDLib answers his requests at an apparently constant rate. After a closer observation, he realizes that when he retraces his navigation steps, the responses from his PDLib come even faster! Matthew also noticies that while retrieving documents from his PDLib to his PDA, document transfers are successfully completed after losing connection with the network—or even after turning off his PDA—once the wireless connection is resumed.

Shaping and Sharing Personal Libraries with Universal Access. Sarah, a visual arts undergraduate student, has been working on a two-student writing assignment for her Comparative Mythology class. Sarah and Aidan, her team partner, have chosen the comparison of the Norse and Greek mythological systems as their assignment topic. Sarah is using her Personal Digital Library (PDLib) from her desktop PC to store and classify documents regarding the Norse myths. She has decided to arrange her documents under a collection named “Norse Myths” created for this purpose. Within this collection, she has created other collections to group documents under certain themes (e.g. “Theogenic Myths”, “Creation of the Universe”) or has placed the documents directly under “Norse Myths”. She has also created a “Vikings Online” collection to store useful Internet references. In order to simplify the classification of the “Vikings Online” collection and ease its population, Sarah has customized the metadata set definition of the “Vikings Online” collection so that it contains just a “Title” and “URL” fields. Finally, Sarah has set the permission rights of her “Norse Myths” collection so that Aidan has access to her documents. Aidan, currently working on campus at the Visual Arts Center and off campus as a graphic designer, is constantly moving from one location to another. He also changes his working computer frequently since he has been assigned a PC laptop at the Visual Arts Center and a desktop Macintosh at the design agency. During his precious free time at both works, Aidan is able to browse the references found by his teammate Sarah for their shared homework using his PDLib to access hers. As a complement to Sarah’s research, Aidan has to find useful information regarding Greek mythology and has been storing his findings at his PDLib under a collection called “Greek Myths”. On one occasion, he ran into his friend Matthew at the campus walkways. After briefing each other on their latest doings, they found out that Matthew—who is working on an essay for his Classical Mythology class—could use one of Aidan’s references. Since Matthew does not have a PDLib, Aidan uses the email document feature of his PDLib from his mobile phone to send Matthew a copy of the reference. A few days before the due date of his written assignment with

The convenience of universal access is portrayed in the scenario. Despite moving from one place to another or changing computing devices, Aidan was able to reach his documents wherever and whenever he needed to. This was possible since PDLib’s architecture can provide digital library services to most computing devices with an Internet connection by contemplating several client application types to address the host diversity of the mobile environment. Mobile thick client applications also adapt to the limitations of the mobile environment. Specialized connection middleware is used to support the adaptation measures undertaken by mobile thick clients. Client applications can be used to browse the contents of other digital library systems. Interoperability has been made possible by the inclusion of modules compliant with the Protocol for Metadata Harvesting of the Open Archives Initiative (OAI-PMH [5]) in our personal library system. The final part of the scenario describes the user experience provided by our personal digital library with universal access infrastructure by means of mobile thick clients. The infrastructure includes a 4

mobile connection middleware that interacts with mobile thick clients in order to provide users with: • Constant Navigation Rate. Provides the user with the maximum number of data records in a relatively constant, user-modifiable, response time. The maximum number of records is derived from an estimation of current network state. In practice, this feature favors response time out from a trade-off with data transfer throughput. • Connection Aware Document Transfers. The user can transfer documents between his mobile device and his PDLib regardless of disconnections and/or turning the device’s power off. Document transfers are connection aware in the sense that: (a) transfers are resumed when connectivity is established and (b) transfer rate is adjusted according to network state estimation. Since document transfers persist across disconnections, the user must be provided with a cancelation mechanism.

Figure 4: PDLib Overview 2. Server Tier. Shows the server system infrastructure that provide services to clients: (a) Data Server, (b) Mobile Connection Middlware (MCM) and (c) Web Front-end. 3. Interoperability Tier. Includes other (PDLib) data servers and OAI-PMH [5] compliant digital library systems.

• Navigation Speed Up via Caching. Allows the user to speedily retrace his navigation steps. The increase in back navigation speed is obtained by using navigation caching mechanisms.

3

The devices of the client tier communicate with the server tier to access PDLib digital library services. The access type of the client tier with the server tier varies according to the client device’s capabilities. The architecture of PDLib reflects the following access types:

Universal Access Architecture

• Middleware Access. Supports mobile devices, especially those with limited computing resources (e.g. HTTP-enabled mobile phones, PDA).

To provide universally available digital libraries we have developed an architecture and a prototype that provide library services to several device types (e.g. desktop, laptop, PDA and mobile phone) with multiple operating systems (e.g. Windows, Linux, Mac OS, Palm Os and Windows CE). The driving idea behind this architecture is to provide each device with the best way to interact with the services offered by the data server. In the remainder of this section, we elaborate in our universal access architecture in the context of PDLib with support of Figure 4—an overview of PDLib—and Figure 5—its detailed architecture. PDLib is composed of three layers:

• Web Access. Provides HTTP-access to any device that includes a Web browser. • Direct Access. Applications with very particular requirements can access the data server directly. In Section 3 we provide an example of such an application. The following sections describe the components of PDLib, their interaction and prototype implementations.

3.1

1. Client Tier. Includes the variety of devices with which a user can interact with PDLib;

Clients

As mentioned before, the software clients of a mobile environment can be classified according to 5

Figure 5: PDLib Architecture their client-side architecture and mobility. We elaborate on this classification with support of Figure 1, Figure 4 and Figure 5. It is possible to map the mobile client application types to the devices supported by PDLib as follows:

the abstraction of a personal digital library on a mobile device. A middleware is required to help mobile clients achieve their mobile connection adaptation goals. • Web clients (fixed and mobile thin clients). This category includes those devices with a browser or microbrowser capable of displaying HTML, as shown in Figure 3, or WML pages. Despite the fact that Web clients do not have all the features available in thick clients, they provide basic interaction support with PDLib. Web clients connect with PDLib’s Web Front-end.

• According to client-side architecture. Thin client applications connect to the Data Server through the Web Front-end while thick client applications communicate with the Data Server via the Mobile Connection Middleware. • According to mobility. Mobile applications connect to the server tier via unreliable wireless connections while fixed applications connect to the server tier via a reliable wired connections. Figure 4 shows wireless connections as dashed lines and wired connections as solid lines.

• Application clients (fixed thick clients). Application clients are intended to run in desktop and laptop devices with few computing resources constraints. Application clients are desirable because they provide a richer interface than Web clients. In addition, application clients can communicate directly with the data server, which leads to a more efficient communication of fixed clients.

With the objective of providing an access medium appropriate to the devices shown in Figure 4, the following client application types were defined (see Figure 5):

The previous client types were defined because each type has very particular characteristics. Application clients are intended to run on resource unconstrained devices such as desktop or laptop. On the other hand, a mobile user can interact with PDLib using a mobile client. Mobile clients allow

• Mobile clients (mobile thick clients). Mobile clients were designed to cope with the limitations of the mobile environment. This means that mobile clients redefine the functionality provided by PDLib’s fixed clients to provide 6

device. Mobile clients read results received from the MCM and display them on screen. The architecture of user-interaction support in the mobile client prototype includes (Figure 6): • Image Set. Images play an important application role, since representative images (e.g. folder or document thumbnails) enhance user experience. However, images do add an overhead to application size both in permanent storage and memory. In order to reduce runtime memory demands, images are retrieved from permanent storage on a per screen basis, as requested by user interaction. Once recalled from permanent storage, images will remain in memory, for later use.

Figure 6: Mobile Client Architecture the interaction with the personal library offering a subset of the functionality provided by fixed thick clients. Web clients, on the other hand, were designed to endow the user with access to their library without the prerequisite of a client application installed on a computing device. One of the main challenges is in mobile clients, since they have to deal with the limitations of the mobile environment. To accomplish their task, mobile clients perform the following functions:

• Library Services. Library services are made available through user-interface items (e.g. combo boxes, lists, menus and buttons). Similarly to image retrieval from permanent storage, interface items are also generated as demanded by user interaction. The services provided by the mobile client to the user are: (a) Glance through the contents of his/her personal digital library (Figure 2 (a)); (b) Retrieve and store a local copy of selected digital library documents on the mobile device; (c) Manage her/his personal digital library by: i. Moving/copying documents or collections and ii. Setting access rights to documents or collections (Figure 2 (b)); (d) Share their digital library content by: i. Sending documents via email and ii. Sending a document to another PDLib user (Figure 2 (b)); (e) Search for documents in her/his personal digital library (Figure 7 (c)); (f) View and edit document metadata (Figure 7 (d)); and (g) Browse the contents of other users’ personal digital libraries.

• Local Storage Mechanism. Store documents in the mobile device for offline viewing. • Connection Adaptation Mechanism. This mechanism provides a constant response time despite the connection variability of wireless connections, by interacting with the MCM. A constant response time can be achieved by calculating the size of the data-transfer window and predicting network state [6]. Since most connection adaptation work is performed in the MCM, more detail is provided in Section 3.3.

• Library Navigation. The mobile client eases the interaction between users and the digital library with a navigation interface similar to the interface provided by folder-based file systems to access library collections and documents (Figure 2 (a)).

• User Interaction Support. Graphical interface that allows the user to manage the personal digital library content from its mobile device. Mobile clients read results received from the MCM and display them on screen. Mobile clients provide a simple and friendly interface in order to let users to get an easy interaction through their navigation on this application. This graphical interface allows a user to manage the personal digital library content from its mobile

3.2

Web Front-end

The Web Front-end transforms personal digital library services into a web application. In order 7

• Connection Support. Required by mobile clients in order to perform adaptation to the bandwidth variability of the mobile environment and to cope with the frequent disconnections of mobile devices. • Process Delegation. Executes functions that would demand an excessive (and quite possible, unavailable) amount of computing resources to the mobile device.

Figure 7: PDLib Mobile Client: (d) Document Metadata.

• Mobility Support. Performs operations in order to speed up document retrieval from the data server to store them in cache servers that are closer to the user. When the user changes location, it would be necessary to support migration of the information between different instances of the MCM.

(c) Search,

to support fixed and mobile thin clients, the Web Front-end is capable of delivering WML or HTML according to the requesting device (i.e. browser or microbrowser). The Web Front-end provides the following functionality:

• Device Interaction Support. Performs adaptation of content according to the characteristics of the device on which it is desired to display the information.

• Session Handling. Session handling is used to maintain the interaction of a thin client with the Web Front-end beyond a single HTTP request.

Current work has focused on mobile connection support and has produced the architecture and prototype implementation of the following mechanisms (Figure 8):

• Browser Markup Language Support. Verifies the markup languages supported by the requesting browser and serves HTML or WML accordingly.

3.3

• Session Handling. Maintains knowledge of connection state in order to simulate a permanent communication and also keeps records of the mobile client’s device characteristics. Conceptually, this mechanism is analogous to the session handling mechanism of the Web Front-end, yet their implementation vastly differs. In the Web Front-end, session handling is realized with standard HTTP sessions while the MCM session mechanism is custom-made and both application and device-oriented (see below);

Mobile Connection Middleware

One of the main problems to solve in order to provide the data server services to mobile clients is the fact the data server was designed to support fixed devices (e.g. desktop computers). There is a remarkable difference in computing resources between mobile (e.g. mobile phones, PDAs) and fixed devices (e.g. desktop computers). This computing resource disparity makes difficult to adapt the data server to the capabilities of mobile devices and signals the need of a middleware component to mediate the interaction of mobile devices with the data server. Our design of such a middleware is called the Mobile Connection Middleware (MCM for short). The MCM is designed to provide following functionality:

• Pagination Mechanism. Provides the mobile client with a subset (page) of the original data server response for: (a) its display on the mobile device screen or (b) transfers of document segments; • Caching Mechanism. Stores query responses received from the data server in order to accelerate the responses it serves to the mobile client; and 8

back to the client. The session identifier is then stored in the session handling mechanism. Afterwards, the mobile client will use the session identifiers to interact with the MCM in the context of a single session. Session identifiers have a Timeto-Live (TTL) and can be renewed (effectively, by granting a new identifier) if required. Mobile devices have limited screen, processing, memory and communication resources. A segmentation of personal library objects may be desired to improve user experience. For example, a mobile user wants to visualize all the documents located in a personal library collection. The resulting request could generate a large amount of records that may, for example, fit in the mobile device screen. The limited computing resources of mobile devices may lead to the following cases:

Figure 8: MCM Architecture

• An answer that is too large to be handled due to limited communication capacity of the mobile device (e.g. overload of the incoming buffer).

• Connection Adaptation. Adjusts the rate of data exchange (e.g. data records, document segments) between the MCM and the mobile client according to connection state.

• The device can handle the reply and is able to show it, but its size can lead to an unpleasant user experience (e.g. when presenting 100 or more data records in a PDA screen).

Connection support is accomplished by the joint work of these mechanisms with mobile thick clients. The MCM prototype provides the following connection support services to mobile clients: (a) Session handling over the stateless XML-RPC protocol. Since our session mechanism is deviceoriented, currently we do not support changing devices in the middle of a session; (b) A constant collection navigation response time despite the connection variability of wireless connections; (c) Partition document transfers and disconnection support while transferring documents. In addition to providing connection adaptation, the MCM also stores query responses received from the data server in order to accelerate the responses it serves to the Mobile Client. In the remainder of this section, we elaborate in the implementation of the mobile connection support mechanisms of the MCM, with support of Figure 8. The MCM’s session handling mechanism preserves connection state and simulates a persistent connection over the stateless XML-RPC protocol. Session support begins with the mobile client making a service request to the MCM. The MCM, after verifying that there is no active session for the usermobile client combination requesting the service, generates a unique session identifier and sends it

• Mobile users are faced with variable connection quality. An uncertain wireless network state introduces varying response times for the same request. These technical challenges can be solved with a pagination mechanism. Where a page is a subset of a library object (e.g. a collection or document format) to be presented in the mobile client. The MCM pagination mechanism works as follows: First the MCM receives a mobile client request, processes it, and request the library object implied to the data server, if needed (i.e. if a cache miss occurs). The data server replies the petition to the MCM, which stores the library object in its cache, and divides the object into pages of a size dependent on network state which are then pulled by the client. In order to pull the pages of the library object, the client stores the indexes of the received pages so it can request the remaining pages. Thus, enabling the mobile client to: (a) reconstruct library objects after retrieving all their pages (e.g. document formats) and (b) present the individual pages of library objects to the user (e.g. collection contents). 9

• α, called gain, is the importance of the current network state, 0 < α ≤ 1.

A response time delay is introduced when the MCM forwards mobile client requests to the data server. Consequently, a caching mechanism was developed to reduce this delay. The MCM’s caching mechanism stores library objects received from the data server. Library objects are assigned a particular TTL within the cache. When a mobile client requests a digital object through a personal library service, the MCM verifies if a session has been previously established. If a session exists, the MCM’s cache is checked for a copy of the digital object. If the cache contains the object (cache hit), the object format is adapted for its transferral to the mobile client and handed over to the pagination mechanism, which interacts with the mobile client to transfer records or segments (i.e. pages) of the object adjusting the rate of data exchange according to the wireless network state. If the MCM’s cache does not contain the object (cache miss), the MCM forwards the request to the data server, caches the response, converts the digital object to the format expected by the mobile client and sends the requested object to the client at transfer rate dynamically adjusted after wireless network state. Although the MCM pagination and caching mechanisms improve the interaction between mobile clients and the data server, wireless network variability must be considered to enhance user experience. For this purpose, a connection adaptation mechanism to dynamically adjust the size of page requests according to connection bandwidth was implemented on the client side. The connection adaptation mechanism is largely based on the caching and pagination mechanisms. The purpose of connection adaptation is to vary the size of requested page according to network state to obtain an Expected Response Time (ERT). The result of this mechanism is a constant navigation rate and connection aware document retrievals. Connection adaptation is achieved by weighting current network state (i.e. the response time of the last page request) against a moving average of previous requests. To perform connection adaptation, we use the exponentially-weighted moving average (EWMA) [7] formula: τi+1 = αti + (1 − α) τi

• ti is the current page size observation. • τi is the prior page size estimate. In order to estimate mobile network state, it is necessary to define a unit of measurement for page size. The current prototype considers two page size units according to the library objects under pagination: (a) collection page size is measured in content records (i.e. references to child collections and/or documents); and (b) document format pages are measured in bytes. In order to maintain a constant response time, page size is estimated using Equation 2: Ri+1 =

ERT ∗ Ri ti

(2)

Where: • Ri+1 is the page size to be sent in request i+1. • ERT is the Expected Response Time. • Ri is the page size of request i. • ti is the time it took to receive a page of size Ri . The result of Equation 2 is used as the ti of Equation 1 in order to calculate the page size of the next page request of collection objects. In document object retrievals, Ri represents the amount of bytes that were transmitted in time ti , and Ri+1 is the number of characters that will be requested in the next petition. The knowledge of received page indexes allows the mobile client to resume the retrieval of document objects despite mobile client disconnections. To accomplish dynamic adjustment of document page size, the MCM issues a ping request to the data server and measures return time to estimate connection status. Once this sampling is done, the MCM requests the document to the data server, stores the document into its cache and lastly, the MCM sends the document to the mobile client through the pagination mechanism. Document retrieval is performed in a low priority thread. The performance of connection adaptation in our system was tested in the context of two experimental environments:

(1)

Where: • τi+1 is the newly estimated page size. 10

Figure 9: MCM Connection Adaptation 1. Controlled environment. The amount of mobile devices in the wireless network is limited and well known. The experiments in this environment were made in an office with five devices sharing the same communication channel simultaneously.

trolled environment (dashed lines), the page size is larger than in the uncontrolled environment (solid lines). Thus, reflecting that network capacity is being exploited by the connection adaptation mechanism, achieving a constant response time in spite of network conditions. Figure 9 shows that, in both the controlled and the uncontrolled environment, a similar constant response time is obtained (this particular experiment reported a 0.02 cumulative difference between the two environment response times). Therefore proving that a constant response time is effectively preserved by the connection adaptation mechanism in both environments. In addition, observed response times present a slight variation from the reference ERT.

2. Uncontrolled environment. The amount of mobile devices in the wireless network is uncontrolled. The experiments were made in a wireless campus network where most users are students, faculty, and staff using Wi-Fi connectivity from wireless laptops, handheld computers, and other wireless devices. Experimentation results are summarized in Figure 9, which shows that page size varies according to network state (topmost series) while preserving a constant response rate (lowermost series). Figure 9 is the result of an experiment which consisted of ten requests. Each request obtained the next page of the digital object. Experiments began with a 5 record page size and fixed values of ERT = 2 seconds and α = 0.5. The fixed value of the ERT allows the mechanism to estimate the page size which is required to preserve the ERT, using Equation 2. The value of α was chosen after experiments performed in [6] which show that a fixed value of α = 0.5 minimizes the time difference between request response time and the ERT. Figure 9 presents the behavior of the mechanism in both experimental environments. In the con-

3.4

Data Server

The data server provides the services of a personal digital library, stores personal digital library data and supports interoperability via the OAI-PMH [5]. The data server provides the following functionality: • Personal Library Services. The data server offers creation, retrieval, update and deletion operations over the library objects stored in the personal data storage (collections, documents and metadata). The data server also provides services to copy and move documents or collections, search for documents in 11

a personal library, send documents to the personal library of other users, email documents to any user with an email account, perform document format conversion, change collection metadata sets, edit document metadata and grant or revoke permission rights over personal library content.

platform independent technologies allowed us to concentrate development efforts in a single language prototype implementation. For this reason, we decided to use Java as the programming language and platform. There are Java virtual machines available for most desktop (e.g. Linux, Windows, Macintosh) and (e.g. Palm, Windows CE) mobile platforms. In addition, the availability of development APIs written in Java helped the implementation of the PDLib prototype. Consequently, PDLib mobile clients were developed using J2ME while the data server, MCM and application clients were developed using J2SE. The mobile client was developed using the CLDC/MIDP stack of the J2ME platform. In particular, we use IBM’s J9 virtual machine [8]. The J9 virtual machine is available for a great diversity of mobile platforms (Palm OS, Windows CE, Symbian, Linux) with which we hope to cover a majority of the devices that are on the market. The data server, the MCM and our application clients have been developed using J2SE and therefore can run in any operating system that has an implementation of the Java Virtual Machine. As of the writing of this paper, the following prototype implementations of PDLib components have been developed:

• Personal Data Storage. It provides structured storage with scalable information retrieval capabilities for personal digital libraries. An information retrieval (IR) tool (a.k.a. search engine) is used to index personal digital library content. A database is used to store the objects of the personal digital libraries since we require a structured model to represent personal library objects and the relations among them. The IR tool indices allow fast random access to words within the indexed content and are used to find information stored in the database. The usage of a text search engine and a database in the data server permit the combined use of text-based queries and SQL datatype-based queries to support the hierarchical classification of personal library content. The hierarchical classification can greatly improve query performance since it allows queries to be specified against documents contained in individual collections with or without the inclusion of their nested collections to the scope of the query.

• A mobile thick client application: PDLib Mobile Client. The Mobile Client application allows the user to: 1. Glance through the contents of his/her personal digital library (Figure 2 (a)); 2. Retrieve and store a local copy of selected digital library documents on the mobile device; 3. Manage her/his personal digital library by: i. Moving/copying documents or collections and ii. Setting access rights to documents or collections (Figure 2 (b)); 4. Share their digital library content by: i. Sending documents via email and ii. Sending a document to another PDLib user (Figure 2 (b)); 5. Search for documents in her/his personal digital library (Figure 2 (c)); 6. View and edit document metadata (Figure 2 (d)); and 7. Browse the contents of other users’ personal digital libraries. The prototype implementation of PDLib’s Mobile Client has been developed using CLDC/MIDP J2ME (Figure 2) The mobile client communicates with the MCM via the XML-RPC protocol libraries of the Java/XML Enhydra Application Server.

• Communication Backbone. Establishes a communication backbone with other data servers. • OAI-PMH Support. The data server exposes the metadata of the personal digital library documents via the OAI-PMH [5]. In addition, the data server harvests metadata of other OAI-PMH compliant library systems to provide users with a subset of the personal digital library services to interact with other (OAIPMH compliant) library systems.

3.5

Implementation

One of the most significant problems discussed while deciding which technologies were to be used in the implementation of PDLib, was deciding on using platform independent technologies against platform specific technologies. The selection of 12

• Web Front-end. (a.k.a. PDLib Web Client) The Web Front-end was developed using Java Servlets, Cascading Style Sheets (CSS), HTML and WML. The Web Front-end verifies the display capabilities of the requesting browser and answers accordingly (e.g. serving a WML page instead of HTML if the requesting browser is incapable of displaying HTML) (Figure 3). The Web Front-end runs on the Tomcat servlet container.

all personal digital library data [9]. We are currently working to incorporate the Lucene full-text search engine [10] to improve performance and reduce index storage requirements. • A fixed client application: The PDLib BeUp! Bin. The BeUp! Bin is a prototype client application that uploads files to a special collection where all unclassified documents are stored. The current implementation of this application supports the addition of files via “drag-and-drop” operations, the queueing of upload requests and direct file selection. The BeUp! Bin is an example of an application client with direct access to the Data Server (as explained earlier in this section). The application communicates directly with the PDLib Data Server using the Apache XMLRPC client library.

• MCM (Mobile Connection Middleware). The MCM interacts with the Mobile Client to provide the following connection support services: (a) Session handling over the stateless XML-RPC protocol; (b) A constant collection navigation response time despite the connection variability of wireless connections; (c) Back navigation speed up thanks to the caching of query responses received from the data server; and (d) Reliable document transfers over the mobile network through: 1. Connection aware document partitioning, 2. independent transfers of document segments and 3. session support for document transfers. In the current implementation, the MCM services are accessed via XML-RPC. The Apache XML-RPC implementation is used to provide the services.

4

Related Work

This section compares our universal access architecture for digital libraries with other architectural approaches used by search and retrieval systems. In addition, a comparison of our mobile connection support mechanisms with related work in the field is provided. Our universal access architecture for digital libraries can be roughly described in terms of its distribution architecture as a client/server architecture with two novel features: 1. Support for multiple device access in the client side; and 2. Inter-server data communication. According to their distribution architecture, present distribution approaches of search and retrieval systems could be classified as: A. Client-side Systems. Provide personalization services to the user’s interface (i.e. the client side) with the system. Personalization services in the client-side allow users to tailor their interaction with distributed search and retrieval systems and/or the WWW Some examples of client-side systems are found in [11, 12, 13, 14, 15, 16]. The implementation of client-side personalization services is not restricted to the user’s local system. B. Peerto-peer Systems. Search and retrieval services are provided by the client/server nodes of a distributed peer-to-peer network. All nodes run an identical software implementation. A Web-based peer-topeer digital library design and prototype implementation for desktop computers is provided in [17].

• Data Server. PDLib’s Data Server provides personal digital library content storage and retrieval services. The prototype implementation of the Data Server supports the creation, retrieval, update and deletion operations over the library objects stored in the personal data storage (collections, documents and metadata). The data server also provides services to copy and move documents or collections, search for documents in a personal library, send documents to the personal library of other users, email documents to any user with an email account, change collection metadata sets and edit document metadata. The data server implementation exposes the metadata of the personal digital library documents via the OAI-PMH [5] and harvests metadata of other OAI-PMH compliant library systems in order to allow PDLib clients to browse their contents. The prototype implementation of the data server has been created using J2SE. MySQL is used to store 13

A peer-to-peer digital library initiative that seeks to provide an alternative to centralized search and retrieval systems can be found in [18]. C. Middleware Systems. Under this approach, search, retrieval and storage services are provided to the user by a mediation environment. The actual data of the individual user comes from heterogeneous data sources registered at the mediation environment. The mediation environment maps the content of heterogeneous data sources to a user-defined schema [19]. D. Server-side Libraries. Search, retrieval and storage are provided by the server infrastructure of the system. The server-side approach is specially suited to support access from multiple device types. PDLib [4, 20], the testbed of this research, provides server-side personal digital libraries.

library via a Web browser. However, the thin client architecture of Web browsers is incapable of performing mobile connection adaptation as required by the mobile user.

5

Conclusions and Future Directions

We have presented a universal access architecture for digital libraries. We have implemented our approach to universal access in our testbed system: PDLib. The main components of the architecture are the Client-Side Applications, the Data Server and the Mobile Communication Middleware (MCM). The Client-Side Applications target mobile and fixed devices and depending on the device digital library services are adapted. The Data Server supports the services for data storage, indexing and retrieval of information from users. The MCM is an intermediary between mobile devices and the Data Server and provides functionality for mobile connection support, such as: (a) Session support over the stateless XML-RPC protocol, (b) pagination of query responses and document transfers, (c) query result caching and (d) automatic connection adaptation. Regarding automatic connection adaptation, we have obtained an effective constant response rate regardless of network conditions. A mobile software client that benefits from these mechanisms was described and supplemented with implementation notes showing how—despite the limited computing resources of mobile devices—it can interact with a data server that has not been designed to support client mobility via adaptation techniques implemented in a middleware. A proof-of-concept prototype implementation that exemplifies these concepts has been developed. Furthermore, we consider that our universal access architecture could be considered as a reference for other universal access applications where (a) distributed and efficient access to data, (b) device independence, (c) distributed search capabilities and (d) interoperability with other systems are required. PDLib is an ongoing research and development effort where interesting challenges related to digital libraries and mobile computing are presented and is being extended and improved towards an open-source release. We are currently working on the following research topics: (a) Scalability.

Next, we compare the mobile connection support provided by the Mobile Connection Middleware (MCM) of our universal access architecture with other work in mobility support: A. Transaction Support. Works such as CODA [21], Bayou [22], DAgora [23] and Rover [24] fall within this category. Both the MCM’s connection support mechanisms and the mentioned work all strive to preserve transaction/operation state while providing temporary storage for offline operation, the MCM’s connection support approach differs from previous transaction support approaches in the fact that it was designed for devices with limited resources and low memory availability (e.g. mobile phones, PDAs), whereas the mentioned work was intended for devices without stringent computing resources limitations (e.g. desktop computers). B. Adaptation Middleware. Works such as Odyssey [25] and iMASH [26] fall within this category. Odyssey presents a middleware for dynamic adaptation of communication between mobile devices and data servers. However, Odyssey’s communication adaptation is based in the fidelity of a digital file to perform connection adaptation while MCM’s connection adaptation mechanism follows a connection aware pagination strategy. iMASH was designed as a transaction manager for mobile devices. Conversely, our approach is to store incomplete transactions in the mobile client, which is responsible for resuming them once connectivity is regained. C. Data Access from Mobile Devices. Prior work in this area includes PoP’s [27]. PoP’s presents a model with which a mobile device is capable of accessing the information of a digital 14

A personal digital library system must cope with an increasing number of mobile users that will demand a large amount of distributed storage resources to hold an even greater quantity of personal data objects (e.g. documents); (b) Mobile performance and availability. We are conducting efforts to support digital library services during offline operation in mobile clients. In addition, caching and prefetching techniques that can improve performance of mobile devices accessing digital library objects are also topics of interest for our work; (c) Content adaptation. Content summarization and visualization methods of digital library information on limited screen size devices are being explored could provide an added value to mobile clients of our personal digital library system; and (d) Interoperability. In addition to the current functionality there is the opportunity for exploring further interoperability with the OAI-PMH [5] and other search and retrieval systems like web search engines.

mobile computing. David A. Garza-Salazar received his B.Sc. (Hons) in Computer Science from ITESM, Campus Monterrey. He received his Ph.D. degree in Computer Science from Colorado State University. Currently, he is a Full Professor and Director of Research and Graduate Program in Information and Electronic Technologies at ITESM, Campus Monterrey. His research interests include parallel computing, compilers, distributed systems, operating systems, digital libraries and mobile computing. He is a member of the ACM, the IEEE Computer Society and the Phi Kappa Phi Honor Society. Juan C. Lavariega received his Ph.D. degree in Computer Science from Arizona State University. Currently, he is an Associate Professor at the Informatics Research Center at ITESM, Campus Monterrey. His research interests include specification methods in software engineering, databases systems, object-oriented databases, web services and mobile computing. He is member of the Mexican National Researchers System, the ACM, the IEEE Computer Society and the Upsilon Pi Epsilon Honor Society.

Acknowledgements We would like to thank Adan Salinas, Luis Hurtado, Miguel Escoffie, Eugenio Flores, Miguel Arellano, Marcos Guevara, Manuel Deschamps, Eduardo Álvarez and Andrés Ornelas for their support in the elaboration of this paper. To all of them our most sincere gratitude.

Lorena G. Gómez received her Ph.D. degree in Computer Science from Arizona State University. She is an Assistant Professor at the Informatics Research Center at ITESM, Campus Monterrey. Her general areas of research are databases and software engineering. She is currently teaching courses in data warehousing, databases and objectoriented analysis and design.

About the Authors Francisco Álvarez Cavazos is currently a Ph.D. candidate studying at ITESM, Campus Monterrey. He holds a B.Eng. (Hons) in Systems Administration from FIME-UANL and an M.Sc. (Hons) in Information Technology from ITESM. His research interests include object-oriented software engineering, distributed databases and information retrieval. He is particularly interested in scalable distributed storage and efficient retrieval of personal library content.

Martha Sordia Salinas is an Assistant Research Professor and Industrial Consultant at the Informatics Research Center at ITESM, Campus Monterrey. Her research interests include information retrieval systems, digital libraries, data bases, software engineering and programming languages.

References

Roberto García Sánchez is currently an Analyst Sr. for GE Transportation as part of the Aircraft Engines RTS support team. He holds a B.Sc. in Computing from ESIME-IPN and an M.Sc. in Information Technology from ITESM. His main research interest is in distributed systems and

[1] S. K. Madria, M. Mohania, S. S. Bhowmick, and B. Bhargava, “Mobile data and transaction management,” Information Sciences—Informatics and Computer Science: An International Journal, vol. 141, no. 3–4, pp. 279–309, 2002.

15

[2] E. Pitoura and G. Samaras, Data Management for Mobile Computing, vol. 10. Kluwer Academic Publishers, 1998.

[15] L. N. Cassel and U. Wolz, “Client side personalization.,” in DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries, 2001.

[3] D. A. Garza-Salazar, J. C. Lavariega Jarquín, and M. Sordia-Salinas, “Information retrieval and administration of distributed documents in internet. the phronesis digital library project,” in Knowledge Based Information Retrieval and Filtering from the Web (W. Abramowicz, ed.), ch. 3, pp. 53–73, Boston, MA: Kluwer Academic Publishers, November 2003.

[16] C.-C. Hsu, F.-C. Tseng, J.-M. Chen, and C.-H. Chang, “Constructing personal digital library by multi-search and customized category,” in Proceedings of the Tenth IEEE International Conference on Tools with Artificial Intelligence, pp. 148–155, November 1998. [17] S.-U. Guan and X. Zhang, “Design and implementation of a web-based personal digital library,” Journal of the Institution of Engineers Singapore, vol. 44, no. 3, pp. 59–77, 2004.

[4] F. Alvarez-Cavazos, D. A. Garza-Salazar, J. C. LavariegaJarquin, L. G. Gomez, and M. Sordia, “Pdlib: Personal digital libraries with universal access,” in JCDL ’05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, (Denver, CO, USA), pp. 365–365, ACM Press, New York, NY, USA, July 07–11, 2005.

[18] Old Dominion University Digital Library Group, “Freelib: Peer-to-peer digital library.” Website, March 2005. p2pdl.cs.odu.edu/.

[5] OAI, “The open archives initiative.” Website, October 2004. URL www.openarchives.org/, accessed October, 2004.

[19] D. O. Briukhov, L. A. Kalinichenko, and N. A. Skvortsov, “Personalization through specification refinement and composition,” in DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries, 2001.

[6] R. Garcia-Sanchez, “Connection adaptation techniques for mobile clients that access digital library services,” Master’s thesis, ITESM, Campus Monterrey, December 2004. Available in Spanish.

[20] PDLib, “The personal digital library project.” Website, August 2005. URL copernico.mty.itesm.mx/pdlib/, accessed August, 2005.

[7] M. Kim and B. Noble, “Mobile network estimation,” in Proceedings of the Seventh Annual Conference on Mobile Computing and Networking, pp. 298–309, July 2001.

[21] M. Satyanarayanan, “The evolution of coda,” ACM Transactions on Computer Systems, vol. 20, pp. 85–124, May 2002.

[8] K. Gilhooly, Using WebSpere Studio Device Developer to Build Embedded Java Applications. ibm.com/reedbooks, first ed., April 2004.

[22] A. Demers, K. Petersen, M. Spreitzer, D. Terry, M. Theimer, and D. Welch, “The Bayou Architecture: Support for Datra Sharing Among Mobile Users.,” In Proceedings of IEEE Workshop on Mobile Computing Systems and Applications, pp. 2–7, December 1994.

[9] MySQL, “The world’s most popular open source database.” Website, 2004. URL www.mysql.com/, accessed October, 2004. [10] Lucene, “Jakarta lucene is an open source project available for free download from apache jakarta.” Website, 2004. URL jakarta.apache.org/lucene/, accessed October, 2004.

[23] N. Perguiça, L. Martins, J. Domingos, and J. Simão, “Flexible data storage for mobile computing,” ACM SAC ’99, 1999.

[11] M. Theobald and C.-P. Klas, “BINGO! and Daffodil: Personalized exploration of digital libraries and web sources,” in In 7th International Conference on Computer-Assisted Information Retrieval (RIAO 2004), Test, 2004.

[24] A. Joseph, J. Tauber, and M. Kaashoek, “Mobile computing with the rover toolkit,” IEEE Transactions on Computers, February 1997.

[12] W. C. Janssen and K. Popat, “Uplib: a universal personal digital library system,” in Proceedings of the 2003 ACM symposium on Document engineering, pp. 234–242, ACM Press, 2003.

[25] B. D. Noble, M. Satyanarayanan, D. Narayanan, J. E. Tilton, J. Flinn, and K. R. Walker, “Agile applicationaware adaptation for mobility,” in Proceedings of the sixteenth ACM symposium on Operating systems principles, pp. 276–287, ACM Press, 1997.

[13] D. L. Hicks and K. Tochtermann, “Towards support for personalization in distributed digital library settings.,” in DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries, 2001.

[26] R. Bagrodia, T. Phan, and R. Guy, “A scalable, distribuited middleware service architecture to support mobile internet applications,” Wireless Networks, vol. 9, pp. 311–320, 2003.

[14] M. D. Giacomo, D. Mahoney, J. Bollen, A. MonroyHernandez, and C. M. R. Meraz, “Mylibrary, a personalization service for digital library environments,” in DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries, 2001.

[27] N. Castellanos and J. Sánchez, “Pops: mobile access to digital library resources,” Proceedings. 2003 Joint Conference on Digital Libraries, 2003.

16