The Magic Science of Visualization - Semantic Scholar

3 downloads 12958 Views 953KB Size Report
Today in an age of cheap and commonplace high-resolution colour displays ... queryword or keyword frequency signature. ..... interface to a digital library.
The Magic Science of Visualization David M. W. Powers ([email protected]) School of Informatics & Engineering The Flinders University of South Australia GPO Box 2100 Adelaide 5001 Australia

Darius Pfitzner ([email protected]) School of Informatics & Engineering The Flinders University of South Australia GPO Box 2100 Adelaide 5001 Australia

Abstract The visualization of information is in many respects back in the stone age, or rather the text age. WIMP GUIs just place more text on the screen and search engines use their graphic bandwidth to display advertisements, but don’t help us to assimilate information faster. The snazzy engines tend to present 1D clusters in text (e.g. vivisimo.com) or add expensive 3D representations to display the traditional ten hits without their summaries (e.g. kartoo.com). In this paper we review a variety of information visualization interfaces and examine the discrepancy between the techniques used and our cognitive abilities to process visual information. We introduce a number of psychophysiologically motivated dimensions and present our analysis of how using them can speed our ability to assimilate and sift information.

Introduction Today in an age of cheap and commonplace high-resolution colour displays, popular search engines all use text to display the information they return. The standard is to return a 10-document list selected from the hundreds of thousands of hits that arise from a typical query. The visualization of information is in many respects back in the stone age. There are however a few little known search engines that cluster information so that whole segments of the search space can be eliminated or ignored – but still remain text interfaces (e.g. vivisimo.com). Others add expensive 3D representations without adding anything to actuallly facilitate or speed our assimilation and sifting of the information, simply displaying the traditional ten hits as fancy 3D globes – but without summaries (e.g. kartoo.com). The rich information that we have access to has many dimensions – the kinds of dimensions we use to organize a library. In Information Retrieval (IR), some commonly used and directly available dimensions are size, age, popularity, location/URL and file type along with the ubiquitous queryword or keyword frequency signature. Compression of signatures is possible, with Singular Valued Decomposition (SVD) of keyword frequencies leading to the well known Latent Semantic Indexing (LSI) that reduces the number of dimensions from lexicon size to a much smaller number of topical components. Even so, there are too many dimensions to display in a simple 2D graphic, and 3D representations produce little improvement and can actually reduce performance (Sebrechts et al., 1999). This failure of the 3D representation is not surprising if

the 3D effects distract from critical information, making it less salient. The ambiguity of 3D coordinates represented in 2D means that extra 3D cues are needed to allow accurate 3D location - so if size, colour or texture are used this way they are lost as dimensions in their own right. We are investigating interfaces that match the natural dimension of our display to the dimensions that the human perceptual interface is attuned too. We are exploring how to minimize cognitive load and maximize the information assimilated subconsciously. Surprisingly, there has been relatively little in the way of form evaluation of visualization interfaces and most presentations of interfaces present no comparative evaluation and no cognitive justification for the choices made.

The Magical Number Seven Critical in our research is the reality of cognitive limitations as exemplified in Miller’s magical number seven plus or minus two (chunks of information). This gives a potential 23 bits of information available along a single dimension, but simultaneously varied dimensions cannot convey as much information so we cannot simply sum this information and assume that we can handle the corresponding multiplication of possibilities. Miller (1967) estimates that 60 chunks of information are available by combining half a dozen or so (7±2) independent dimensions, and 150 chunks of information available by using dynamic animation rather than static display. Simon (1969; Larkin and Simon, 1987) has also considered the role of such cognitive limitations in relation to the effectiveness of visualization, but it is a rare consideration in discussion of modern graphical interfaces. Given that each of 150 concepts using around seven dimensions will require around seven words to describe, there are about 1000 words of information that we should be able to absorb at a glance in an animated presentation. We propose to maximize the information that can be subconsciously assimilated by use of a number of simple heuristics: 1. by matching display and dimensions (size with size); 2. by redundant combination (polygon sides with size); 3. by encouraging chunking (natural clustering); 4. by animating presentations (draw/move slowly); 5. by using user-controlled views (mouse controlled). This paper focuses on the effective use of ‘display dimensions’, but this is just one aspect amongst many we have brought together into a unified taxonomy in Pfitzner,

(c) 2003 David Powers & Darius Pfitzner; pp529-534 ICCS, UNSW Australia

Hobbs and Powers (2002). Although our exploration and evaluation of display dimensions is at an early stage, it leads us to propose several things a developer should keep in mind when dealing with the output dimensions. We elaborate the above heuristics to give the following as examples of the factors we need to consider based on our basic perceptual limitations: • Attribute Resolution: For representation involving a single output dimension only six or seven distinctions can be handled without conscious processing. • Number of Attributes: It seems that it is pointless to visualize more than six or seven features or attributes to distinguish ‘data facts’, and even then the resolution that can be subconsciously processed and recalled may be limited to only two or three distinctions per feature. • Explicit and Implicit Grouping: It is useful to represent data facts in such a way as to allow the user to subconsciously group and recode. Whilst clustering techniques can be used to explicitly recode and limit the amount of detail, visualizations showing natural clusters can convey the same information implicitly, as long as appropriate dimensions are displayed. • Natural Interactivity: A user should be able to interact with the display in a way that leads to intuitively reasonable modification to the display (e.g. new views showing different perspectives or levels of detail). • Views and Cues: When changing views we should provide cues to help the user ‘clear out’ the old information in the dimensions that are being reused. In addition it is helpful to have cues to clarify the relationships and continuity between views. Various animation techniques can serve one or both of these purposes. A common approach is to retain all data as context but have some in a higher resolution focus. • Sequential and Parallel Presentation: Distinctions that may not be salient in a simultaneous presentation may become salient when it is animated, so that time becomes an additional dimension available to contrast data objects or present or reinforce a specific attribute. Unfortunately, many modern visualization tools take users beyond their everyday understanding of physical objects (e.g. requiring them to operate in n dimensional space or expecting them to perceive multidimensional relationships). Such unintuitive interfaces can be counterproductive and negate the benefits that might be derived from visualization.

Evaluation Measures We can find little general literature on the evaluation of IR visualization interfaces, so we propose a number of factors that need to be taken into account starting with the standard IR measures. 1. Recall: a measure of how well the relevant results are represented in the data returned, being the ratio of the number of relevant retrievals to the total number of relevant documents in the collection. 2. Precision: a measure of how much of what is returned is relevant, being the ratio of the number of relevant retrievals to the total number of retrieved documents. Unfortunately recall and precision suffer from a number of limitations, and assume the return of a single unranked set of

results for a single query. When used with ranking the problem thus becomes when to cut off the returns, so there is a tradeoff between returning more results with the hope of increasing recall and fewer results with the hope of increasing precision. Often an arbitrary number of returns is selected, with 10 and 100 being common figures, or figures are given for multiple return sets – R10, P100, etc. When clustering is carried out there is a more complicated problem of assessing the utility of clusters, and here multiple manually developed classes may be compared with automatically determined clusters. Furthermore, a visualization interface involves providing multiple viewpoints and allowing users to cull the results interactively. Clearly other factors must be taken into account before we can sensibly apply and interpret recall and precision. Indeed, there are many problems with these as accuracy measures and we have developed the Bookmaker measure to directly assess the extent to which results are due to correct decisions (Powers, 2003) We therefore introduce the following additional factors that should be taken into account in evaluating an interface: 3. Bookmaker accuracy: to what extent are the results due to correct or incorrect use of information rather than random guessing? 4. Time: given a single retrieval task what time is taken, including system and user time, to achieve that task? 5. Number of interface interactions: given a single retrieval task how many times does the user interact (e.g. click, drag, etc…) with the graphical interface? 6. Number of refinements: how many times has the query been refined? 7. User opinion: the opinion of users should be solicited under controlled conditions to help capture factors such as the intuitiveness and friendliness of the interface. 8. Cognitive load: a user' s mental load in using the interface to achieve a specific search/retrieval task. Cognitive load is directly influenced by the design of the user interface and will typically be measured by assessing how effectively a user can use the interface concurrently with other tasks or distractions. Here we are not necessarily seeking to minimize cognitive load. Rather the ideal is to maintain a level of cognitive load such that the task is not trivial or boring, yet does not overload the user to the point where the tool is difficult to use or error rates increase. 9. Number of errors: how many mistakes did the user make when using the interface? 10. Learning curve: how quickly can a novice user learn to become proficient in using the interface? 11. Effective use of screen real estate: does the visualization effectively use the maximum amount of screen area available? 12. Number of results displayable: how many results can be effectively displayed to the user in a given area of the screen? 13. Mode of use: what task is the interface being used for? e.g. Searching for answers to a specific question, a specific document or a specific reference. 14. Multi-session support: does the interface support use of usage history or feedback usable adaptively in other searches by the same or another user?

15. Significance: Have we enough data/subjects/trials for our results to be statistically significant and what is the probability that our results are due to chance? 16. Bandwidth: what is the trade-off between server load, client load, and network load? A search engine may have a Java visualization tool running as an application or as an applet on the user's machine (client side), resulting in the search engine computational load and the network bandwidth being similar to a normal text interface. Conversely, some interfaces require new results and a complex graphical view to be calculated by the server and transmitted to the client – a heavy server and network load that leaves the client-side processor idling.

Figure 3. TileBars (Hearst, 1995) provides a clustering interface but in addition provides a colour coded visualization of the significance of specific querywords in a document. All these criteria have been used to aid us in evaluating existing interfaces in the literature and to guide our design of a more effective visualization interface – using information retrieval and web search as the tasks in focus. Figure 1. Vivisimo.com clustering metasearch engine shows a conventional listing of results (right frame) plus a list of automatically generated clusters (left frame) in an explorer tree (click on + to open a cluster, and – to close it, click on the name to display the corresponding subset of results).

Figure 2. Scatter/Gather (Cutting, Karger and Pederson, 1993) provides 5 clusters with a list of characteristic keywords and a view of the contents in scrollable frames. It allows interesting clusters to be retained and uninteresting clusters to be discarded, then the documents in the retained clusters are regathered and reclustered.

From Text to Graphic We now proceed to examine a representative sequence of IR interfaces, starting with the pure textual interfaces. Note that figures are full resolution screenshots and may be expanded to their original format in the electronic version of the paper. The pure linear text interface is represented by well known internet search engines such as Google.com. These interfaces usually return 10 results by default, although often this may be increased to 100 by ‘advanced users’. The first interface we will illustrate is the metasearch engine Vivisimo.com, which augments this standard linear display with a set of clusters (Figure 1). This is a very effective interface although how effective it is is determined by how sensible the clusters and their titles are. Vivisimo’s explorer interface allows complete control of which subclusters are viewed and hidden so the user is unlikely to get lost as full context is maintained. The explorer interface is a good candidate for displaying textual information associated with a graphical representation. Vivisimo also scores well in that opening and closing of clusters and display of cluster summaries is achieved efficiently without further recourse to the server. The pure clustering text interface is well represented by Scatter/Gather (Cutting, Karger and Pederson, 1993; Hearst and Pederson, 1996) which provides windows into 5 clusters in scrollable frames (Figure 2). The unique feature of Scatter/Gather is the way in which it allows clusters to be selected, recombined (gathered) and reclustered (scattered). However context is lost during this reorganization, and there is the danger that good hits may be discarded in clusters where irrelevant ones were seen. If the selected clusters

were uniformly good, there would be little point in reclustering. Rather the user has to decide which clusters have a relevant theme based on the keywords and titles seen. TileBars (Hearst, 1995) goes a step further toward providing a true visualization of the results by representing clusters as bitmaps showing the relative weight of colour coded keywords or sets of related terms in the associated documents (Figure 3). Tkinq does something similar without clustering, using the length of a bar to indicate the strength of a term in an individual document (Figure 4).

Figure 4. Tkinq shows the composition of its hits in terms of a set of keyword weights displayed as bars of varying length.

Figure 5. Cat-a-Cone (Hearst and Karadi, 1997) displays a cluster hierarchy in 3D along with icongraphic 3D representation incorporating the metaphors of a bookcase, books and pages, as well as providing controls keys within the visualization.

Graphical Visualizations Figure 5 illustrates one of the most spectacular graphical visualizations, Cat-a-Cone (Hearst and Karadi, 1997). At the heart of this representation is a ‘cone tree’ showing a hierarchical clustering of the data with documents at its leaves. Once the aesthetic impression is past, however, we realize that there is considerable wasted space, that much of the cone tree is obscured, and that the 3D iconification has come at the cost of reducing the area available to display useful information. Envision (Fox et al., 1993; Nowell et al., 1996) by contrast is an early and much less polished IR

visualization, but has much richer and more meaningful use of visually salient attributes and display dimensions.

Figure 6. Envision (Fox et al., 1993) is a visualization interface designed to facilitate search of a digital library of computer science literature. It is highly customizable and the dimensions of the display are well matched to the dimensions of the domain. The display highlights natural clusters and displays documents in terms of selected attribute dimensions using a specified coding in terms of icon shape, size and colour. In Figure 6 we can see that Envision uses colour in a natural and effective way (red and white indicate useful and useless nodes already rated by the user, grey indicates a grouping/cluster, and brown, green and cyan indicate high to low estimated relevance). The dimensions to be displayed on the axes are selectable, with date of publication shown along the bottom and title down the left side in Figure 6. In addition, there are options to use icon size and shape to display information. Whilst there is some blank space on the screen, there is considerably more area devoted to discriminatory information than in Cat-a-Cone. In addition the use of a 2D representation is clearer and less cluttered than any 3D representation we' ve seen displaying this quantity of information. Overall Envision rates extremely well in our preliminary evaluation, yet in fact it is not a web search interface but an interface to a digital library. It would be very nice to see it in action in a web search context. The aesthetics could easily be improved without negatively impacting performance by using 3D icons and providing a beige background for the graphic display so it is easier on the eyes than pure white. A major advantage of Envision is its customizability – the settings/mappings for all the major attributes of the icons are controllable: size, shape and colour. The most extensive evaluation of 3D versus 2D and text visualizations was carried out for the US National Institute of Standards and Technologies (NIST) on variations of their NIRVE interface to their PRISE search engine (Sebrechts et al., 1999). Unfortunately this was only a ‘between subjects’ evaluation with 15 subjects spread across three interfaces and a limit of 100 documents to sift through – in particular, only 5 subjects (2 IT professionals and 3 students) evaluated any one of the three interfaces. NIST also reviewed extensive existing interfaces (Cugini

and Lawkowski, 2000). Generally, the evaluations found that there was no advantage of either the 2D or 3D systems over text based systems – rather subjects took longer to complete their taks. However, with successive sessions the time taken became very similar for all three interfaces, notwithstanding the increased difficulty of the later tasks. In addition the performance of the professionals seemed to be similar to the students for the text baseline, relatively better for the 2D model, and much better for the 3D spaceball+mouse interface. These results suggest that there is a general experience factor that helps us make better use of more sophisticated interfaces. In fact the two ‘experts’ using the 2D interface were on average not quite as fast as those using the text and 3D interfaces, but with so few subjects none of these comparisons reach significance.

number of documents in the cluster, and a bar chart, which shows the relative weight of the concepts represented. Clusters that differ by a single concept are joined by an arc in the concept' s colour. Nirve also illustrates visualizations that use different representations at different levels in simultaneous display. The usual dendrographic tree representations of hierarchical clusters are intrinsically recursive (cf Cat-a-Cone) and Nirve avoids using text by using colour-coded representations of the defining concepts. Users found the colour-coding the most useful feature in all three interfaces – and one reason text did so well is that it was based on the same underlying search engine and sought to display the same information in the same way so that the distinction between the interfaces was purely that of text versus 2D versus 3D. The biggest mistake made by the interfaces we have looked at earlier is trying to fit text into the visualization – and independent key is far more powerful, and even Nirve faces the problem that it cannot cope with more than seven different concepts before the descriptions get too small to read. This may or may not be sufficient, as suggested by the magic number seven, but in general it seems appropriate to separate the text into a key and provide scroll bars if real estate is an issue.

Figure 8. Kartoo.com is a publicly available web search engine that identifies topics that are at the intersection of multiple documents. Figure 7. Nirve 3D Globe (Sebrechts et al., 1999) displays bargraphs of cluster keyword signatures on the surface of a sphere that shows their relative similarity. The further north the more concepts are represented in the cluster. A spaceball and a mouse are used in concert to navigate. Nirve3D has a number of interesting features most of which are preserved in Nirve2D which is the same visualization projected onto 2 Dimensions, and thus lacks the aesthetic appeal of 3D. The high dimensionality problem is alleviated by grouping together related keywords as userdefined concepts. These concepts weights are displayed for each cluster, and clustering is based on a binary grid technique in which presence or absence of a concept is the only basis for distinguishing clusters. Clusters which have many concepts represented are positioned higher, with the north pole representing documents with all concepts and the south pole representing documents with no concepts. Each cluster is represented by a box, whose size represents the

Nirve avoids the problem of large bushy hierarchical clustering by limiting the number of concepts and using the binary grid classification – there are thus at most 26=64 clusters if it is based on 6 concepts. But in other interfaces, it is usual to limit what is shown at one time, and allow users to explode or ‘enter’ clusters, or to discard useless clusters and present a new view. The most impressive looking web search engine is Kartoo.com, which shows 3D balls representing the usual 10 hits, along with 2½D topic wells and relationships. The original version which showed relationships rather than contours did not add much to the information available in the traditional 10 hit textual listing. The new version with contours displays the information better, but it is necessary to open a site to see if it is relevant. Kartoo is not nearly as itis as  useful    beautiful   !"(Figure # %$ &8). ' Essentially the +/ by enforcing inclusion or exclusion of a keyword from the map, or a topic (collocation or keyword) from the left frame.

Using the buttons on the map triggers a new search, at the cost of a minute or so's delay, whilst using the buttons in the text frame adds the term to the query but awaits a ‘go’ command from the user. This server side activity makes it impractical for serious use. Moreover, it is rare that you would want to exclude a keyword – which means not accepting any document that happens to mention the word in any context with any meaning. The biggest problem is that serious users do not want to see just the domains for just ten hits.

Conclusions In the interfaces we have examined a common technique is to use a user's querywords to select the hits as well as as the major dimensions of the display – TileBars, Tkinq and Nirve all originally encoded the relative frequency of a queryword using basic visual analogs (colour or length), however later versions of Nirve colour-coded using ‘concepts' that are user-defined aggregations of querywords. Another approach is to display query-independent attributes of the documents with the fundamental dimension for characterizing a document being a function of term frequency that characterizes documents independent of the query. Words that naturally occur together can define latent semantic dimensions, collocations, concepts or topics that may be used as dimensions rather than simple keywords. Kartoo and Vivisimo discover ‘useful’ words and collocations, with Vivisimo using them to label a hierarchy of clusters. Clustering may be performed in many different ways with the major choice being whether to display a small number of clusters, as with Scatter/Gather and Nirve, or to navigate a large dendrographic or explorer-style hierarchy, as with Cata-Cone and Vivisimo. A key decision is whether the clustered representation is allocated a small area with the bulk allocated to the contents of a selected node (Vivisimo), or whether the display is devoted to displaying the relationships between clusters, and a new overlaying view is required to see document details (Nirve and Cat-a-Cone, although the latter makes relatively poor use of the space available). Generally 3D graphical interfaces have proven to be ineffective, and even 2D visualizations have not been properly demonstrated to give an advantage. In terms of the ‘magical number seven’, Envision has the cleanest richest match of document attributes to display attributes, maximizing the number of documents and clusters that can be assimilated at a glance – we believe that this should be the major goal of a visualization. The ability to control the allocation of attributes is an important feature and Envision also permits the redundant allocation of features to multiple display dimensions for increased ease of recognition.

Acknowledgements This paper draws heavily on work done by Vaughan Hobbs including a more comprehensive but unpublished draft critique of IR visualization and clustering interfaces (Hobbs, Pfitzner and Powers, 2002). We much appreciate his contribution to this project.

References Cugini, J., Laskowski, S. and Sebrechts, M. (2000). Design of 3D Visualization of Search Results: Evolution and Evaluation. http://www.itl.nist.gov/vvrg/cugini/uicd/nirvehome.html Fox, E. A., Hix, D., Nowell, L. N. Brueni, D. J., Wake, W. C., Heath, L. S and Rao, D. (1993) Users, user interfaces, and objects: Envision, a digital library, Journal of the American Society for Information Science 44:480-491. Hearst, M. A. (1995). TileBars: Visualization of term distribution information in full text information access, Proceedings of CHI ’95, Denver, Colorado, May 7-11, 59-66. Hearst, M. A. and Pedersen, P. (1996), Reexamining the cluster hypothesis: scatter/gather on retrieval results, Proceedings of SIGIR’96, 76-94. Hearst, M. A. and Karadi, C. (1997), Cat-a-cone: An interactive interface for specifying searches and viewing retrieval results using a large category hierarchy, Proceedings of SIGIR ’97. Hobbs, V., Pfitzner, D., and Powers, D. (2002). A Survey of Information Retrieval Interfaces and their Evaluation: http://www.cs.flinders.edu.au/Research/AI/papers/ 200204-drft-IRS.pdf Larkin, J. H., and Simon, H. A. (1987): “Why a figure is (sometimes) worth a thousand words”, Cognitive Science, 11:65-99. Reprinted in Glasgow, J., Harayanan, N. H., and Chandrasckaran, B. (Eds) (1995), Diagrammatic Reasoning: Cognitive and Computational Perspective, AAAI Press / The MIT Press. Miller, G. A., (1956): The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63:81-97. Reprinted in Miller, G.A. (1967) The Psychology of Communication, BobbsMerrill. Nowell, Lucy Terry, France, Robert K., Hix, Deborah, Heath, Lenwood S., and Fox, Edward A. (1996). Visualizing search results: Some alternatives to querydocument similarity, Proc. 19th Annual International ACM SIGIR Conference, ACM, New York, 67-75. Pfitzner, D., Hobbs, V. and Powers, D. (2002). A unified taxonomic framework for information visualization. Proceedings, Australian Symposium on Information Visualization, Adelaide, February 2003. Powers, D. M. W. Powers (1997) Unsupervised learning of linguistic structure: an empirical evaluation, International Journal of Corpus Linguistics 2:91-131 Powers, D. M. W. Powers (2003) Recall and Precision vs the Bookmaker. Proceedings of the International Cognitive Science Conference, July 2003. (this volume) Sebrechts, M., Cugini, J., Laskowskii, S., Vasilakis, J., and Miller, M., (1999): Visualization of Search Results: A Comparative Evaluation of Text, 2D, and 3D Interfaces. In Research and Development in Information Retrieval. Simon, H. A., (1969) The Science of the Artificial, MIT Press 1969. Referenced by Tweedie (1997).. Tweedie, L., (1997): “Characterizing Interactive Externalisations”, Proceedings of the Conference on Human Factors in Computing Systems (pp. 375-382).