Hypertext Metrics Revisited: Navigational Metrics ... - Semantic Scholar

Hypertext Metrics Revisited: Navigational Metrics for Static and Adaptive Link Structures Paul De Bra and Geert-Jan Houbeny Department of Computing Science Eindhoven University of Technology PO Box 513 NL 5600 MB Eindhoven The Netherlands debra , houben @win.tue.nl

f

ABSTRACT

When users navigate through hyperdocuments they may or may not easily get lost. Many factors influence this potential problem, including the structure of the hyperdocument, the navigation aids offered by the hypertext system, and the spatial memory and abilities of the user. Several formal hypertext models have been defined in literature [HS90, SF90, L90, BHK92] but they do not define how to determine which hypertext structures are easy to navigate through and which are likely to cause disorientation. In [BRS92, V94] some metrics for hypertext structures have been defined along with suggestions for “reasonable” values for these metrics. We revisit these metrics and show that they cannot be used to measure the “ease” of browsing through a hyperdocument. We therefore propose navigational metrics that take into account not only the link structure of a hyperdocument but also the chosen starting point for a browsing session and the backtracking which virtually all hypertext systems offer. Differences in the mental state of users may require that the hypertext (link) structure be adapted to each individual’s abilities. The navigational metrics are easily generalized to the case of adaptive hypertext. For this purpose we introduce a simple yet powerful model for adaptive link structures. The adaptive metrics are defined in such a way that static and adaptive link structures can be compared to each other. 1 Introduction

The problem of getting “lost in hyperspace” is mentioned a lot in hypertext literature and is often used as a motivation for the definition and use of formal hypertext reference models, Paul De Bra is also affiliated with the University of Antwerp, Belgium, and with the “Centrum voor Wiskunde en Informatica (CWI) in Amsterdam. y Geert-Jan Houben is also affiliated with Origin B. V. in Eindhoven.

g

like in [HS90, SF90, L90, BHK92], and for hypermedia design methods, like in [GPS93, SR95, ISB95]. These models and methods allow for structural analysis, as described in for instance [ABHK93]. Such analysis reveals potential navigational problem spots, such as circular (link) paths and dead ends. But it does not provide a usability measure or metric for a hypertext structure as a whole. In [BRS92, V94] such metrics are defined, and the similarity between the values for a number of real and simulated hyperdocuments are used to suggest that if a link structure has a very different value it may be potentially difficult to navigate through. Some experiments in [V94] confirm that browsing through some documents with metric values that are very different from the typical ones (observed in [BRS92]) was difficult. The experiments also included browsing through some documents with the typical values, and that proved to be easier. This of course does not imply that all link stuctures with the typical metric values are navigationally easy and all others difficult, but it clearly illustrates that it is worthwhile for authors to try to create link structures that approximate the typical metric values, or to investigate why a hyperdocument’s structure is atypical. In this paper we will show that some metrics of [BRS92], namely compactness and stratum, can be improved to make them more reliable measures of the usability of link structures. These metrics only consider navigation by following links (forwards), and ignore that a hypertext system may provide other means for navigation, like backtracking (using the “Back” button in the interface or browser). We will redefine these metrics to account for backtracking, and show that by doing so the meaning of the metrics becomes more intuitively clear. We will exemplify this with an example of a compact graph which has a low and atypical “plain” compactness, and a higher typical “navigational” compactness. A growing body of recent literature has acknowledged that it may not be possible to find (interesting) link structures which all users manage to navigate through without getting lost. Browser features like backtracking, bookmarks and history lists are insufficient to eliminate these navigation problems completely. Only by adapting the link structure to the individual user can disorientation be eliminated. The possibility for “conditional” links was already present in some earlier

hypertext and hypertext-like systems such as Storyspace and Hypercard. But widespread use of adaptivity in hypertext is a fairly recent phenomenon, which will undoubtedly become even more popular as the ubiquitous World Wide Web is augmented with more and more features that make creating adaptive applications easier. (Examples of technical features that can be used are cookies, client- and server-side scripting, Java applets, ActiveX objects, and more generally Dynamic HTML.) An overview of recent adaptive hypertext systems can be found in [B96]. Such systems can potentially solve the “lost in hyperspace” problem, but if complicated link structures and poorly selected adaptivity are combined they can also result in an even greater navigational nightmare. Distinguishing good from bad link structures is a lot more difficult in adaptive - than in static hypertext, e. g. because the structure cannot be represented as a static graph. It is therefore imperative to have metrics and methods for analyzing adaptive link structures in order to determine potential navigation problems automatically. In this paper we present a generalization of the revisited metrics for the case of adaptive link structures, so that adaptive hyperdocuments can be analyzed as well. This paper is structures as follows: Section 2 reviews some metrics (and examples) of [BRS92, V94]. Section 3 introduces and illustrates revised, “navigational” metrics. In Section 4 we generalize these metrics to adaptive hypertext structures. 2 Metrics for Link Structures

In [BRS92] several aspects of a hyperdocument’s link structure are studied:

When there are few links in a hyperdocument, pages (nodes) are difficult to reach by following links. Users may become disoriented because they need to go through many steps to get to some piece of information they want. Some parts of a hyperdocument may not even be connected by links at all. When there are very many links, the path to a specific node will (on average) be short, but the abundance of links implies that the links do not help the user in selecting interesting places to go. Compactness is a metric which tries to capture how well connected (by links) a hyperdocument is. When looking at the hierarchy, by following links from a given starting point (node), along the shortest path down to each individual node, these paths can be short in the case of a shallow hierarchy or long in the case of a deep hierarchy. The term stratum, taken from the study of organizations, is used for a metric of how deep or linear the link structure is, and thus of how much of a reading (or navigation) order is imposed on the user. Apart from the links that are used in determining the hierarchical link structure, a hyperdocument may also contain links that represent shortcuts to navigate between (arbitrary) nodes. These are called cross-reference links. The ratio of hierarchical vs. cross-reference links can be used as a metric, with similar intentions as compactness: if there are few cross-reference links it takes many steps to navigate between two arbitrary nodes, and if there are many cross-reference links the link structure becomes a dense mess in which links

offer no guidance. Given these metrics (which are defined more formally below) we would like to know the “optimal” values for each of them, indicating that a hyperdocument’s link structure is potentially usable. It is not possible to define one or more metrics and determine optimal values or ranges in such a way that we can guarantee that documents within the desired range are objectively guaranteed to be easy to navigate through and that documents with values outside that range are guaranteed to be unusable. The interpretation of any simple metric is subjective, but comparing values to those of some documents that are known to be usable is a good first step in deciding whether or not a link structure stands a fair chance of being navigationally easy. In [BRS92, V94] metrics were calculated for some existing hyperdocuments (and in [V94] also for simulations). Because the purpose and “real” structure of these hyperdocuments was known, the measured metric values could be interpreted. We present these interpretations along with the definition of the metrics (from [BRS92]) below. 2.1 The (Converted) Distance Matrix

In order to calculate metrics on distances between nodes (measured in number of links to follow) the following matrix is used: Definition 2.1 Given a hyperdocument H with N nodes, the distance matrix D is an N N matrix such that for every pair of nodes i and j : D[i; j ] is the length of the shortest path (in number of forward links) from i to j , if such a path exists; D[i; j ] is 1 otherwise. Figure 1 (taken from an example in [BRS92]) shows a small link structure and the corresponding distance matrix. We cannot perform simple calculations like “take the average of the values in the distance matrix” because of the occurrence of 1. In order to obtain meaningful (or at least finite) results, 1 can be replaced by a large value, called the conversion constant. To avoid interference between this constant and the existing finite values in the distance matrix, the conversion constant needs to be larger than the maximum length of the shortest path between two nodes (measured in forward links). For this reason we usually take the number of nodes, N , as the conversion constant. Definition 2.2 The converted distance matrix for a hyperdocument with N nodes is obtained by replacing each occurrence of 1 in the distance matrix for that hyperdocument by N . Some (hyper)documents contain “index” nodes, with many outgoing links, like the index of a book, and “reference” nodes, with many incoming links, like a list of bibliographic references in a book. These parts of the link structure are very different from the rest of the hyperdocument, and therefore need to be eliminated before calculating metrics. In this paper we assume that index and reference nodes (and links) have been removed (unless stated otherwise). The link structure of Figure 1 for instance contains no index or reference nodes.

10

11 13 4

5 9 1 2

8

3 6

7

1 1 0 2 3 4 5 6 7 8 9 10 11 12 13

1 1 1 1 1 1 1 1 1 1 1 1

2 1 0

1 1 1 1 1 1 1 1 1 1 1

3 1

1 0 1 1 1 1 1 1 1 1 1 1

4 1

1 1 0 1 1 1 1 1 1 1 1 1

5 1

1 1 1 0 1 1 1 1 1 1 1 1

12

6 2 1 1

1 1 0 1 1 2 1 1 1 1

7 2 1

1 1 1 1 0 1 2 1 1 1 1

8 2 1

1 1 1 1 1 0 2

1 1 1 1

9 10 11 12 13 2 2 2 2 2 2 1 1 1 1 1

1 1 1 1 1 1 0 1 1 1 1

1 1 1 1 1 1 1 0 1 1 1

1 1 1 1 1 1 1 1 0 1 1

1 1 1 1 1 1 1 1 1 0 1

1 1 1 1 1 3

1 2 1 0

Figure 1: Example link structure and distance matrix.

2.2 Compactness

The compactness metric tries to measure the average length of the minimal path between two arbitrary nodes, or in other words how well connected the link structure is. In order to obtain values that are comparable for small and large hyperdocuments the metric is defined in a normalized way (resulting in values between 0 and 1). Definition 2.3 Given a hyperdocument H with N nodes and converted distance matrix D, the compactness of H is: M ax

? PNi 1 PNj 1 ? =

M ax

where:

? 2 ?

M ax

=

N

M in

=

N

3

= D [i; j ]

M in

In [BRS92] compactness was measured for three very different hyperdocuments. (This was done before removing the index and reference nodes unfortunately.) The compactness of a description of the Computer Science Department (CMSC) at the University of Maryland was 0.53. That document had 106 nodes and 402 links, a ratio of about 1 to 4. The compactness of the book Hypertext Hands On! (HHO) [SK89] is 0.55. That document has 243 nodes and 1104 links, a ratio also close to 1 to 4. The highly interconnected Guide of Opportunity in Volunteer Archaeology (GOVA) has a compactness of 0.78, and has 222 nodes and 1609 links, a ratio of about 1 to 7. A higher compactness for a hypertext with a higher link to node ratio is expected since adding links means creating shortcuts, thus lowering the average length of (shortest) paths. CMSC and HHO are mostly hierarchical documents, which offer sections and subsections, or topics and subtopics, just like many current Websites. In addition they have a fair number of cross-reference links (pointing to completely different parts of the hyperdocument). The compactness values of around 0.5 seem to suggest that this is a typical value for such documents. GOVA is somewhat atypical in that it has a structure like an encyclopedia, meaning that there is no global hierarchical structure, only a set of independent articles that refer to each other (probably based on the occurrence of “topical” keywords). CMSC and HHO are usable without their index (and reference) nodes, but in an encyclopedia without an (alphabetical) index it would be very difficult to navigate. The small link structure of Figure 1 has a compactness of 0.18, and thus seems not as well connected compared to the large examples from [BRS92]. This low compactness is deceiving however. The existence of index- and reference nodes in CMSC, HHO and GOVA results in compactness values that are much higher than what would be obtained after removing these nodes and associated links. In fact, Figure 2 shows that the graph of Figure 1 is indeed also a well connected hierarchy with some cross-reference links. To better compare the compactness values we could add one index node and one reference node to Figure 1. Adding just one index node increases the compactness to 0.31, and after also adding the reference node the compactness becomes 0.34. Comparing these values with the values for CMSC and HHO shows that the latter two hyperdocuments contain (relatively) more (cross-reference) links that are responsible for generating shorter paths between arbitrary pairs of nodes. 2.3 Stratum

N N

Max = N 3 ? N is the maximum value the sum of all elements of the converted distance matrix can assume, and is reached when the hyperdocument contains no links at all. Min = N 2 ? N is the minimum possible value of that sum, and is reached when the links form a complete graph (i. e. there are links in both directions between each pair of nodes). A hyperdocument without links has a compactness of 0 and a complete graph has a compactness of 1.

Stratum measures how “linear” a hyperdocument is, or in other words, how much the user is forced to navigate in a certain (linear) way. Definition 2.4 Let H be a hyperdocument with N nodes and distance matrix D. Let D0 be derived from D by replacing all occurrences of 1 by 0. The stratum of H is 4

PN PN i=1

j

j =1

D

0 [j; i] ? PN

j =1

3 N

D

0 [i; j ]j

if N is even and is 4

PN

i=1

P N

0 j =1 D [j; i]

j

3 N

?

? PNj 1

0

j

= D [i; j ]

nodes typically consists of a number of back steps (up the hierarchy towards the root) followed by a number of forward steps (down the hierarchy in the “right” direction). The presence of cross-reference links means that there may be shorter paths between nodes than the “hierarchical” path.

N

if N is odd. In [BRS92] it is explained that this metric consists of sums of (absolute values of) sums of distances from a node to other nodes minus distances from other nodes to that node. These numbers are also called the absolute prestige of the nodes, and there sum is the absolute prestige of the whole hyperdocument. The normalization factor N 3 =4 or (N 3 ? N )=4 represents the absolute prestige of a linear document, also called the linear absolute prestige or LAP, which is maximal. (These formulas are proven in [BRS92].) The stratum of a hyperdocument is a value between 0 and 1. The value 0 is the stratum of a circular link structure, and 1 is the stratum of a linear structure. The three example documents have very different stratum values: before removing index and reference nodes CMSC has a stratum of 0.13, HHO of 0.05 and GOVA of 0.02; after removing index and reference nodes these values become 0.24, 0.10 and 0.03 (all this according to [BRS92]). It is remarkable that while the compactness of CMSC and of HHO are virtually identical (0.53 vs. 0.55) the stratum values are very different. These values indicate that in HHO the reader has more navigational freedom than in CMSC. The stratum of the link structure of Figure 1 is 0 :12. This fairly low value illustrates that there is also a lot of navigational freedom in this graph. 2.4 Hierarchical vs. Cross-Reference Links

Given a certain starting point (or node) the link structure can be “hung” from that node. Figure 2 shows the graph from Figure 1, hung from node 1. The structure looks like a hierarchy or dag (and almost a tree), with some exceptions. The arrows pointing down represent the hierarchical links and the other links are called cross-reference links.

1

2

3

5

4

The number of hierarchical links is always near N . In a tree that number is N ? 1, but since more than one hierarchical link may lead to the same node, like the two links leading to node 6 in Figure 2, the actual number may be slightly higher. Nonetheless we can estimate the ratio for the three example documents of [BRS92]: 1 to 3 for CMSC and HHO and 1 to 6 for GOVA. If we assume that hyperdocuments like CMSC and HHO have a large index node pointing to almost every other node, and a large reference node, pointed to from almost every other node, the ratio for these documents after elimination of index and reference nodes should be about 1 to 1, meaning that there are about as many hierarchical as cross-reference links1 . In [V94] a series of experiments with (artificial) link structures is described which try to find the optimal ratio between hierarchical and cross-reference links. These experiments also lead to the suggestion that a 1 to 1 ratio is easiest to navigate through. 3 Navigational Metrics

In this section we redefine compactness and stratum, taking into account that virtually all hypertext user-interfaces offer the possibility to do backtracking, i. e. to follow a link backwards after it has been followed forwards. Also, we consider that a user starts navigating from a given root node r, and we are only interested in link structures such that all nodes can be reached from r by following links forwards. (Disconnected subgraphs cannot be reached and are therefore undesired structures. They are easy to identify because they have a distance from r of 1 in the distance matrix.) According to [BRS92] “Compactness indicates the intrinsic connectedness of the hypertext”. However, the definition of compactness, by only considering following links forwards, treats backtracking as a very expensive operation, or equivalently as a non-existing connection. (The “distance” between two nodes i and j such that there is no path from i to j is counted as N .) This is counterintuitive since in most hypertext systems backtracking is not only not more expensive than following links forwards, it is usually a lot cheaper (faster) because of caching.

Figure 2: Tree-like graph representation to show hierarchical structure.

It is this odd aspect of compactness that explains why a reasonably well connected structure like the almost-tree of Figure 2 has a compactness of only 0:18. The problem is that for a tree with N nodes (which Figure 2 comes close to) there are only O(N log(N )) finite values in the distance matrix, compared to O(N 2 ) occurrences of 1.

Considering that in virtually all current hypertext systems one can do backtracking (i. e. follow a link backwards after following it forwards first), the path between two arbitrary

1 Indeed, to point from an index node to all other nodes requires 1 links, and to point from every other node to a reference node requires 1 links. In CMSC and HHO, with four times as many links than nodes, there are thus about 1/4 hierarchical links, 1/4 index links, 1/4 reference links and 1/4 “real” cross-reference links.

8 7

6

12

13

9

10

11

N? N?

Backtracking not only makes the actual distances between nodes shorter than what the distance matrix suggests, most hyperdocuments are built in such a way that all nodes are reachable from the root node. This calls for some kind of distance matrix in which the value 1 does not occur, thus eliminating the need for a “conversion constant”. 3.1 The Navigational Distance Matrix

The values in the distance matrix represent the shortest paths between nodes. When considering backtracking, the shortest path from an arbitrary node a to another arbitrary node b of depends on the selected root node (where all navigation starts) and on the path the user followed from that root to node a. In the sequel we will talk about length, distance, time and cost as interchangeable units. Following a (single) link can involve a length, time, cost or more generically a “weight”, which may be different from 1. Most often the weight f for following a link forwards will be greater than the weight b for following that link backwards (aka for backtracking). Theorem 3.1 Given a hyperdocument H with a root node r from which all nodes are reachable. Let i and j be arbitrary nodes of H . After following links (forwards) from r to i, the shortest path from i to j consists of zero or more backtracking steps from i back towards r , followed by zero or more forward links towards j . This remains true if we consider weights, like f for a forward link and b for a backtracking step. Proof: If not then that path from i to j contains some forward link from a node x to y followed by a backtracking step (from y back to x). Eliminating these two steps leads to a shorter path (shorter by two steps, or actually f + b). The shortest path can thus not contain any backtracking step after a forward link. (QED) In a tree with N nodes for instance, the distance from the root to the leaves is O(log (N )), so the distance between two arbitrary nodes is also O(log (N )). In a linear document the distances are O(n). Below we give a new definition of compactness, which takes into account that the path between two nodes consists of backtracking steps and forward links. The definition also accounts for the possibly different (shorter) time it takes to follow a link backwards, because the interface (browser) may have cached nodes. Definition 3.2 Given a hyperdocument H with N nodes and a starting point (root node) r. Assume that the “length” of a forward link is f and the “length” of a backtracking step is b. (Usually b f but that doesn’t matter.) The length of the shortest path from a node i to j is b times the number of backtracking steps in this path plus f times the number of forward steps. If there is a path from r through i to j the distance from i to j is the average over all paths from r to i, not containing loops, of the lengths of the shortest paths from i to j . If there is no path from r through i to j then the distance from i to j is 1 if i 6= j and 0 if i = j .

The navigational distance matrix of H with respect to r is an N matrix D, such that for every pair of nodes i and j D [i; j ] is the distance from i to j . N

Note that for a path from r through i to j to exist it is necessary and sufficient that a path from r to i and a path from r to j exist. (One can backtrack all the way from i back to r and then follow the path to j .) Thus, if every node is reachable from the root node r, the navigational distance matrix does not contain any occurrences of 1. However, if no path from r to i exists, then D [i; j ] will be 1, even though there may be short paths or even a direct link from i to j . The reason for this is that these paths cannot be used because there is no way to reach node i first. Note also that the same path from i to j may be counted several times in D[i; j ], namely once for every path from r to i that ends with the backtracking part of the path from i to j . In order to avoid having to calculate the average of infinite numbers of paths we have added the requirement that paths must not contain loops. 3.2 Navigational Compactness

Using definition 3.2 we can now redefine compactness. The new definition is no longer a function of the set of links alone, but depends on the choice of a root node r from which al navigation starts. Definition 3.3 Given a hyperdocument H with N nodes and a navigational distance matrix D (for root node r) which does not contain occurrences of 1. Let f be the length of forward links and b the length of backward links, and b f . The navigational compactness of H is: M ax

? PNi 1 PNj 1 ? =

M ax

where:

1

M ax

= 6 (f + b)(N 3

M in

1 = 2 (f + b)(N 2

? ?

= D [i; j ]

M in

N) N)

Here M ax represents the maximal sum of the elements in the navigational distance matrix, which is that for a linear (singly linked) document, and M in is the minimal sum, which is that of a complete graph (with links in both directions between each pair of nodes). Theorem 3.4 Given M ax and M in from Definition 3.3, M ax is the maximum the sum the elements in a navigational distance matrix can be when the value 1 does not occur and M in is the minimum for this sum. Proof: When there is no 1 all nodes can be reached from root node r. It is easy to show that the maximum sum of distances is that for a linear document (a linear chain of forward links starting from r). Let the nodes be numbered 1 : : : N , and let r be node 1. Figure 3 illustrates the navigational distance

1 2 3 4 5 6

1 0 1b 2b 3b 4b 5b

2 1f 0 1b 2b 3b 4b

3 2f 1f 0 1b 2b 3b

4 3f 2f 1f 0 1b 2b

5 4f 3f 2f 1f 0 1b

6 5f 4f 3f 2f 1f 0

Figure 3: Navigational distance matrix for a linear document with 6 nodes.

matrix for such a document (with 6 nodes). Row i contains first distances to previous nodes, which are all backward links, then zero (the distance from i to i) and then distances to the following nodes, which are all forward links.

The forward distances in row

i2 ?i . f

+

N

1 ? i are

f

PN

j =1 j =

2

2

3

A simple addition leads to M ax =

1 3 6 (f + b)(N

?

N)

.

In a complete graph half of the shortest paths between two (distinct) nodes consist of just one forward link, and the other half consist of just one backward link:

Links from to any other node take one forward link, and links from any node to take one backward link. To get from to where 6= and 6= takes one forward link if the path to is ! and one backward link if the path to is ! ! . r

r

i

j

i

i

r

j

i

r

r

j

r

i

i

The average distance between two nodes is thus (f + b)=2. The sum of all the values in the distance matrix then becomes 1 2 M in = 2 (f + b)(N ? N ). (The ?N is caused by the diagonal with zeros.) (QED) Unlike “plain” compactness, navigational compactness is not defined for all link structures but only for link structures in which every node can be reached from the root node. Just like “plain” compactness, navigational compactness always has a value between 0 and 1. Figure 4 shows the navigational distance matrix for the link structure of Figures 1 and 2, for f = 2 and b = 1. D[2; 1] for instance has value 2: if node 2 is reached by following the link 1 ! 2 then going (back) from 2 to 1 takes 1 backtracking step; if node 2 is reached by following the links 1 ! 5, 5 ! 9 and 9 ! 2 then going from 2 to 1 requires going back the same way, thus taking 3 backtracking steps. The average is thus 2. From the navigational distance matrix we can calculate that the navigational compactness of this example graph is =

0:68

10 4 4.5 5 4.5 2 5.67 5.5 5.5 5 0 3 6 5.5

11 12 13 4 4 4 4.5 6 6 5 2 5 3 6 2 2 5 5 5.67 5.67 6.67 5.5 7 7 5.5 7 7 5 6.5 6.5 3 6 6 0 6 4 6 0 6 4 7 0

Figure 4: Navigational distance matrix for Figure 1.

3.3 Navigational Stratum

The stratum metric also suffers from an anomaly, albeit a different one than with compactness. With compactness the influence of pairs of nodes (i; j ) such that there is no path from i to j outweighs that of the existing paths because of the large conversion constant. With stratum, these “infinite” distances are completely ignored in the calculation. So while paths that require backtracking (or that are impossible) dominate the calculation of compactness, such paths play no role in stratum. By replacing the (plain) distance matrix by the navigational distance matrix we can redefine stratum to let backtracking come into play. It is easy to show that the LAP values of N 3 =4 and (N 3 ? N )=4 are multiplied by f ? b. This leads to the following definition: Definition 3.5 Let H be a hyperdocument with N nodes and navigational distance matrix D, with weight factors f and b such that f 6= b and in which the value 1 does not occur. The navigational stratum of H is 4

PN PN i=1

j

j =1

D [j; i]

(f

?

? PNj 1

b)N

=

j

D [i; j ]

PN PN i=1

j

j =1

(f

? PNj 1 3? )

D [j; i]

?

b)(N

3


1092 ? 510:33 1092 ? 234

9 4 2.5 5 4.5 2 4.33 4 2 0 3 3 6 5.5

Pi?1

The sum over all the rows is thus 2 i=1 (i ? i) = ( N ? N ). 6 From the near symmetry in the matrix (seeb Figure 3) one sees that the backward distances amount to 6 (N 3 ? N ). f

f

1 2 3 4 5 6 7 8 1 0 2 2 2 2 4 4 4 2 2 0 4 4 2.5 2 2 2 3 1 3 0 3 3 2 5 5 0 2.5 6 6 6 4 2 4 4 5 1 3 3 3 0 5 5 5 6 2.67 2 3.67 4.33 3.67 0 4 4 7 3 1 5 5 3.5 3 0 3 5 3.5 3 3 0 8 3 1 5 9 2.5 2 4.5 4.5 3 4 4 2.5 10 2 4 4 4 1 6 6 6 11 2 4 4 2 1 6 6 6 4 4 3 6 6 12 2 4 1 13 4 5 5 1 3.5 7 7 7

=

j

D [i; j ]

N

if N is odd. The navigational stratum for the example graph of Figure 1, with weights f = 2 and b = 1 is 0:038, which indicates that very little reading order is imposed by this link structure. It is normal for navigational stratum values to be lower than “plain” stratum values because backtracking increases the navigational freedom. An interesting choice of f and b is to take f = 1 and b = 0, meaning that backtracking is for free. The navigational stratum of Figure 1 then becomes 0:16.

Note that navigational stratum has no meaning if f = b, basically because if forward and backward links are considered equal, every node essentially has no (non-zero) “prestige”.

(unconditional) links to that node. Such possibilities are not considered in this paper. 4.1 The Adaptive Distance Matrix

4 Metrics for Adaptive Hypertext

Adaptive hypertext is introduced to help users navigate through a hyperdocument without getting lost and without being led to information nodes that are not appropriate at the time the user visits them. One may think that it is the hyperdocument’s author’s responsibility to ensure that links are chosen in such a way that whenever a user can follow a link to a certain node that node will be appropriate for the user. However, since many paths may lead to the same node, it is nearly impossible to know exactly what information a user has read prior to reaching the source node of a link, and hence it is also impossible to determine whether following that link will lead to information that is appropriate at that time. Adaptive hypertext tries to solve this problem by means of a user model which is constructed while tracking (logging) which links a user follows, and thus by tracking which nodes the user reads. (Some adaptive hypertext systems also take into account the results for tests or the time a user spends reading each node.) Based on the user model certain links may be displayed or hidden, enabled or disabled, based on whether the destination node of that link is considered appropriate for a user with the history (or knowledge) contained in that user’s user model. Many different architectures of adaptive hypertext systems exist. The paper [B96] presents an overview of the different categories, with examples. We consider the following simple model:

When a user visits a node she acquires some knowledge. For a number of concepts she actually acquires a certain percentage of “knowledge about that concept”. With each node an expression is associated that determines which percentage of which concepts is needed or forbidden in order for this node to be appropriate. Links to a node are only enabled (or visible) when that expression evaluates to true. Backtracking to previously visited nodes is always allowed. Many actual adaptive hypertext systems are built around some variation of this model. These include ELM-ART [BSW96a], Interbook [BSW96b], the Multimedia courseware of Da Silva et. al. [PDHDO97] and the Hypertext courseware 2L670 [CDB97a, CDB97b] at the Eindhoven University of Technology. Some actual hypertext systems also allow for adaptive (dynamic) node content. In this paper however we are only concerned with the adaptive link structure. Some adaptive hypertext systems also allow for true “conditional links”, meaning that depending on some condition, individual links may become enabled or disabled. We only consider the case where all links to the same node become enabled or disabled at the same time. In the courseware for 2L670 for instance, individual links may be made conditional by turning their source anchor text into conditional text. This courseware also allows for unconditional links, meaning that even when the expression for a certain node is false there may be some

Like in the previous sections we will define a variation of compactness and stratum based on the distance matrix. Definition 4.1 Let H be a hyperdocument with N nodes and a root node r. With H we associate a set of concepts C . With each concept we associate an integer “knowledge” variable which has an initial value of 0.

With each node n we associate a set of pairs (c; v ), meaning that when the user visits node n for the first time (!) the knowledge of concept c increases by v up to a maximum of 100 (and this for each pair (c; v ) of that node). With each node n we associate a boolean expression en containing logical combinations of comparisons between concepts and concepts or concepts and values. We say that the links to n are enabled when en evaluates to true, and disabled when en evaluates to false. A hyperdocument, augmented with concepts and boolean expressions as described above is called an adaptive hyperdocument. Navigating through adaptive hypertext appears to be more difficult at first, because after each (forward or backward) step, say leading to node i, the expressions for all ej for all nodes j for which there is a links i ! j need to be evaluated to find out whether these links are enabled. However, these calculations are performed by the system, in order to show or hide links. When adaptivity is used wisely by the author the reader of an adaptive hyperdocument need not even be aware that the link structure is not static. Properties which trivially hold for static hyperdocuments may no longer hold for adaptive ones. In a static hyperdocument for instance, a shortest path between two nodes can never contain a forward link followed by a backward link, because the combination of these is a no-op. In adaptive hyperdocuments however it is possible that a shortest path contains such a forward and backward link, because the seemingly redundant visit to the destination of that link changes the values for some concepts (and thus augments the user’s knowledge), possibly resulting in some expressions becoming true or false. This makes finding a shortest path more difficult: Theorem 3.1 does not hold for adaptive hyperdocuments. Definition 4.2 A root-path in H must start in the root node and may only contain forward links that are enabled at the time they are followed as well as (arbitrary) backtracking steps. With a path from node i to j we always mean a tail of a rootpath from r to through i to j , starting with (some occurrence of) i. The length of a path is f times the number of forward links plus b times the number of backtracking steps. r

Using this definition of what paths (and root-paths) are and what the length of a path is we can define a new kind of distance matrix. Definition 4.3 Given an adaptive hyperdocument H with N nodes and a root node r. Assume that the length of a forward link is f and the length of a backtracking step is b. If there is a root-path from r through i to j the distance from i to j is the average over “all” paths from r to i of the lengths of the shortest paths from i to j , taking into account only paths from r to i that do not contain the same loop twice. If there is no root-path from r through i to j then the distance from i to j is 1 if i 6= j and 0 if i = j . The adaptive distance matrix of H with respect to r is an N N matrix D , such that for every pair of nodes i and j D [i; j ] is the distance from i to j .

1

= 2 (f + b)(N 2

M in

?

N)

It can be easily shown that allowing loops does not alter the fact that the minimum possible sum of the elements in D is 1 2 2 (f + b)(N ? N ), just like with navigational compactness. This implies that compactness cannot be larger than 1. The value for M ax however is not the maximum possible sum: The maximum distances are achieved not by a linear structure but by a graph like Figure 5. Each node i generates (100% of) concept i, and requires (100%) knowledge about all concepts 1 : : : i ? 1. In order to get from 1 to 5 for instance, a long path is needed: 1 ! 2 ! 1 ! 3 ! 1 ! 2 ! 4 ! 2 ! 1!3!5 (1)

Note that although this definition looks identical to that of a navigational distance matrix, the adaptive distance matrix is actually different since paths may be longer because they may only contain enabled links.

(1&2&3)

(1&2&3&4&5)

2

4

6

3

5

7

1

(1&2)

A second difference in the definition is that for the navigational distance matrix paths were not allowed to contain loops, while in adaptive hyperdocuments loops may be necessary (because they augment the knowledge). However, going through the same loop more than once does not augment the knowledge values. Multiple passes through loops are forbidden to prevent having to average over an infinite number of paths.

1 2 3 4 5 6

1 0

2 1f 0

b b

f

2b 2b 3b

3 2f f

+b b

f

(1&2&3&4) (1&2&3&4&5&6)

0 f

+ 2b

f

+ 3b

+ 2b

2b

+b +b

b

4 4f + 2b 3f + 2b 2f + b 0 2f + 2b

6f 5f 4f 2f

+ 4b + 4b + 3b + 2b

5

6

b

2f

+ 3b

0

9f 8f 7f 5f 3f

+ 6b + 6b + 5b + 4b + 2b

0

Like for navigational metrics we require that the adaptive distance matrix contains no occurrences of 1. However, unlike for static hyperdocuments, this is not a sufficient condition for ensuring that there exists a root-path (from r) that goes through all the nodes of H . Indeed, for nodes i, j and k the fact that there exists paths between each pair of nodes does not imply that there exists a root-path from r through both i and j to k . A separate test whether at least one, or preferably all paths can be extended to a path that visits all the nodes of H is still needed to assure that users can reach all the information in a hyperdocument.

The adaptive distance matrix in Figure 5 has two triangular parts which are very different. Indeed, going from i to j when i < j requires many steps because knowledge needs to be gained along the way, while when j < i the shortest (linear) way can be followed, resulting in much lower values. The total number of forward steps in the upper triangle can be shown P(N ?1)=2 PN ?1?2i to be i=0 j (N ? j ), while the total number j =1

4.2 Adaptive Compactness

of forward steps in the lower triangle is 2

Compactness is defined as usual (but now based on the adaptive distance matrix):

Figure 5: Adaptive link structure with longest paths.

2

PN=2

M ax

? PNi 1 PNj 1 ? =

M ax

where:

M ax

1

= 6 (f + b)(N 3

?

N)

= D [i; j ]

M in

?

i)

=

(N=2)3

P(N ?1)=2

?(N=2)

3

((N

i=1

j =1 j =

when N is even, and when

?1)=2) +(N ?1)=2 higher than 2

is odd it is j =1 j = 2 that. In any case, these numbers are close to (but somewhat lower than) N 4 =12. The numbers of backtracking steps are similar (albeit a bit lower). Because the total of the values in the adaptive distance matrix can thus be of the order O(N 4 ), it does exceed the value M ax = 16 (f + b)(N 3 ? N ). N

Definition 4.4 Given an adaptive hyperdocument H with N nodes and an adaptive distance matrix D for root note r which does not contain occurrences of 1. Let f be the length of forward links and b the length of backward links, and b f . The adaptive compactness of H is:

i=1 (i

2

PN=2 Pi?1

Concluding: the adaptive compactness of a hyperdocument lies within the range ?1 : : : 1. It is intuitively clear that because we explicitly made navigational and adaptive compactness comparable, hyperdocuments with an negative adaptive compactness are very badly connected and require that users make many seemingly unnecessary navigation steps, frequenly revisiting nodes they have seen before.

While adaptive compactness measures now short the (shortest) paths are between nodes in an adaptive hyperdocument, adaptive stratum tries to measure how much (or little) navigational freedom the user has in such a document.

deliberately defined adaptive metrics in such a way that their values can be compared to those for static hyperdocuments. The range of our adaptive metrics shows that adaptive link structures can indeed be less connected and/or more navigationally restrictive than the worst static structures.

Stratum (plain as well as navigational) is based on the concept of linear absolute prestige of a document, because the absolute prestige of a linear document is maximal. In an adaptive context this is no longer true. A linear structure with 6 nodes for instance has a linear absolute prestige of 54, which translates to 54f ? 54b in a navigational context. The absolute prestige from Figure 5 however is 80 f + 30b which is much higher. However, just like with compactness we base adaptive stratum on the same upper and lowerbounds to keep navigational and adaptive values comparable.

When writing a static hyperdocument, the author must carefully select from which nodes to create links to which other nodes. This selection not only depends on the information contents of the nodes (and the relationship between them), but also on assumptions of which nodes the user may have visited prior to arriving at the source for a link. Adaptive hypertext enables authors to base this choice on the knowledge of the user. The hypertext system keeps track of the evolution of this knowledge, in order to decide which links to enable or disable.

4.3 Adaptive Stratum

Definition 4.5 Let H be an adaptive hyperdocument with N nodes and adaptive distance matrix D, with weight factors f and b such that f 6= b and such that the value 1 does not occur. The navigational stratum of H is 4

PN PN i=1

j

j =1

D [j; i]

(f

?

? PNj 1 =

j

D [i; j ]

3 b)N


PN PN i=1

j

? PNj 1 3? )

j =1 D[j; i]

(f

?

b)(N

j

= D [i; j ]

N

if N is odd. It is easy to show that all kinds of stratum of a circular document (without conditional links) have a value of 0. However, the absolute prestige of an adaptive hyperdocument can exceed LAP (by a lot), so the range for adaptive stratum is 0 : : : 1. By basing both navigational and adaptive stratum on the same LAP value, the two metrics are comparable. Although it is possible to create adaptive hyperdocuments with an adaptive stratum value that exceeds 1, we already know (from observing the navigational stratum values for reasonable structures such as Figure 1) that such high values mean that these hyperdocuments impose a very strict reading order upon the user, which is something we do not consider usable in hypertext applications. 4.4 Comparing Static and Adaptive Hyperdocuments

Metrics for static link structures already exist for a long time (since 1992 [BRS92]). These metrics make it possible to compare hyperdocuments of different size and with a completely different information content, and tell how well they are connected and how much navigational freedom they offer. For adaptive link structures this paper provides the first such metrics. Until now, usability issues for adaptive hypertext could never be resolved in a satisfactory way. Authors of adaptive hypertext systems (including us) have often claimed that adaptive hyperdocuments are potentially easier to use than static ones, but also admit that bad adaptive structures can be worse than the worst static link structures. We have

In an adaptive hyperdocument the author can create many links to a given node, and these links become enabled or disabled whenever the system decides that these links are “interesting”. Since creating links “at will” often results in too many links, and thus in a very high compactness and low stratum, it helps that the conditional links keep the navigational freedom within reasonable bounds without the author having to carefully decide for each link whether it is really interesting enough to include it. Creating an adaptive hyperdocument with the same compactness and stratum as a static hyperdocument containing the same nodes (and information) should thus in general be easier. The fact that all links to a given node are enabled or disabled simultaneously is a limiting factor for the author. By disabling links to a node after that node has been read it becomes difficult for the reader to revisit that node. Several adaptive hypertext systems therefore allow for the creation of unconditional links (which are always enabled). Our simple adaptive hypertext model can be easily extended to allow for unconditional links. The corresponding adaptive metrics remain compatible (and thus comparable) with the navigational metrics for static hyperdocuments. 5 Conclusions

In this paper we have recalled metrics for hypertext link structures, and revised them to account for the possibility of backtracking, a feature offered by virtually all hypertext systems (including World Wide Web browsers). The revised versions of two well-known metrics are called navigational compactness and navigational stratum. Not only do these metrics better express properties of the navigation through hyperdocuments, we have generalized these metrics to the case of adaptive hypertext. Because usability of adaptive hypertext cannot be determined by simply looking at the graph structure formed by the links it is all the more important to be able to compare different structures based on some metrics. Adaptive compactness and adaptive stratum not only provide this basis for comparison, they are also defined in such a way that they are comparable with the navigational metrics for static hyperdocuments. It thus becomes possible to compare navigation through static link structures with that for adaptive link structures. This is especially important when the “same hyperdocument” (actually just the same information) is being produced and offered in both a static and an adaptive version.

REFERENCES

ABHK93. W. van der Aalst, P. De Bra, G. J. Houben, Y. Kornatzky. Browsing Semantics in the Tower Model. Computing Science Note 93-47, Eindhoven Univ. of Technology, 1993. (Also presented at the CSN-94 conference, Utrecht, 1994.) BRS92. R. A. Botafogo, E. Rivlin, B. Shneiderman. Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics. ACM Transactions on Information Systems, 10:2, pp. 142–180, 1992. B96. P. Brusilovsky. Methods and Techniques of Adaptive Hypermedia. User Modeling and User-Adapted Interaction, Vol. 6, pp. 87–129, Kluwer academic publishers, 1996. BSW96a. P. Brusilovsky, E. Schwarz and G. Weber. ELMART: An intelligent tutoring system on World Wide Web. Third International Conference on Intelligent Tutoring Systems, ITS-96, Montreal, LNCS Vol. 1086, pp. 261–269, 1996. BSW96b. P. Brusilovsky, E. Schwarz and G. Weber. A Tool for Developing Adaptive Electronic Textbooks on WWW. Proc. WebNet’96 Conference, pp. 64–69, San Francisco, 1996. CDB97a. L. Calvi, P. De Bra. Improving the Usability of Hypertext Courseware through Adaptive Linking. Proc. 8th ACM Conference on Hypertext, Southampton, pp. 224–225, 1997. CDB97b. L. Calvi, P. De Bra. Using dynamic hypertext to create multi-purpose textbooks. Proc. ED-MEDIA’97, pp. 130–135, Calgary, 1997. BHK92. P. De Bra, G. J. Houben, Y. Kornatzky. An Extensible Data Model for Hyperdocuments. Proc. 4th ACM Conference on Hypertext, Milan, pp. 222–231, 1992. SF90. R. Furuta and P. D. Stotts. The Trellis Hypertext Reference Model. In Proc. NIST Hypertext Standardization Workshop, pp. 83–93, 1990. GPS93. F. Garzotto, P. Paolini, D. Schwabe. HDM - A model-based approach to hypermedia application design. ACM Transactions on Information Systems, 11:1, pp. 1–23, 1993. HS90. F. Halasz and M. Schwartz. The Dexter Reference Model. In Proc. NIST Hypertext Standardization Workshop, pp. 95–133, 1990. ISB95. T. Isakowitz, E. A. Stohr, P. Balasubramanian. RMM: a methodology for structured hypermedia design. Communications of the ACM, 38:8, pp. 34–44, 1995. L90. D. Lange. A Formal Model of Hypertext. In Proc. NIST Hypertext Standardization Workshop, pp. 145– 166, 1990.

PDHDO97. D. Pilar da Silva, R. Van Durm, K. Hendrikx, E. Duval, H. Olivi´e. A Simple Model for Adaptive Courseware Navigation. WebNet’97 Conference, Toronto, 1997. SR95. D. Schwabe, G. Rossi. The Object-Oriented Hypermedia Design Model. Communications of the ACM, 38:8, pp. 45–46, 1995. SK89. B. Shneiderman and G. Kearsley. Hypertext HandsOn!: An Introduction to a New Way of Organizing and Accessing Information. Addison Wesley, 1989. V94. J. De Vocht, Experiments for the Characterization of Hypertext Structures. Masters Thesis, Eindhoven Univ. of Technology, 1994.