TGD: Visual data exploration of temporal graph data

1 downloads 0 Views 657KB Size Report
The continued growth in the collection and management of data has driven many ... problems, solutions and their interconnects in the form of a recipe book for ... While this approach is not at the level of a benchmark with performance metrics, it ..... a new set of dark coloured nodes, that previously where positioned at the top ...
TGD: Visual data exploration of temporal graph data Michael Farrugiaa and Aaron Quigleyb a University

b University

College Dublin, Dublin, Ireland; College Dublin, Dublin, Ireland ABSTRACT

This paper describes the social networking questions, analysis, design and approach taken in the realisation of an interactive solution for the VAST 2008 challenge. The solution presented is a case study in this approach and won the phone call mini challenge award. The problem scenario of the competition is used as a case study to explain the approach and experience with the developed tool. Design considerations and observations on the process used are drawn and suggestions on further research in the area of temporal graph data are made. Keywords: Visual analytics, Dynamic social networks, Information visualisation, Case study, Data Analysis

1. INTRODUCTION The continued growth in the collection and management of data has driven many of the research developments in Information Visualisation and Visual Analytics. More specialised conferences, journals and events have emerged but the general problem of how to compare and contrast different visualisation algorithms, methods, techniques, patterns and tools remains.1–4 For example, the pattern approach5 calls for the documentation of a set of common problems, solutions and their interconnects in the form of a recipe book for effective interaction.3 Whereas task driven methods often adopt the “grand challenge” approaches where a conference series provides a dataset and supporting documentation with the intention of motivating the community to tackle one shared problem, as opposed to many different problems. The IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST) has since 2007 organised a competition to encourage research in data visualisation. In order to stimulate more participation, the competition in 2008 was divided into 4 separate mini challenges, that together formed a grand challenge. In practice, the mini challenges could be approached independently, or else taken as a whole for the grand challenge. The data set for all the individual challenges provided a rich data source for analysis, with clear user tasks that each analyst team needed to answer. While this approach is not at the level of a benchmark with performance metrics, it does go someway to allowing comparative study and consideration of the myriad of approaches which can be taken in data visualisation. The datasets and results further provide case studies, solutions and even feed into documented “design patterns” for subsequent study.3, 5 In 2008 one of the mini challenges of the competition consisted of a data set of phone calls between the families of people in control of a controversial religious organisation living on an island. Details on how the data set was created eg. real, synthetic, synthetic based on real data patterns were not provided. However, importantly the phone calls retrieved from the island’s phone company provided enough data to extract the social network of the families on the island. In addition to this, each phone record had the time of the call, the duration of the call, and the location of the cell tower from where the call was made. The time of each call was recorded in minutes, while the duration of calls was in seconds. The data spanned a 10 day period and contained 400 unique cell phones. There was a total of 9834 call records resulting in 1562 distinct edges. The context of the data and the problem scenario were important in formulating the hypotheses that where uncovered from exploring the data. This form of dataset provides a realistic scale for what is studied in both research and industrial settings. The first task of the analyst was to identify key people in the organisation, mainly the leader and subordinates. The second task was to identify how the social structure of the families changed over the 10 day period. The change in social structure involved identifying any change of roles of people in the network, or replacement of personnel, or change in the organisation structure. Apart from identifying the change in the social structure, Further author information: E-mail: [email protected], [email protected] Visualization and Data Analysis 2009, edited by Katy Börner, Jinah Park, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7243, 724309 · © 2009 SPIE-IS&T · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.814921

SPIE-IS&T/ Vol. 7243 724309-1

Syntactical Attributes Actor Attributes centrality prestige / prominence

Semantic Attributes Actor Attributes, e.g. size of an organization age of a person

Structural Partitions cohesive subgroups structurally equivalent actors role equivalent actors

Attribute Partitions, e.g. organizational subunits legal form of an organization attitudes toward policy issues

Network Structure size density centralization cohesiveness

Network Attributes, e.g. period of data gathering reliability

Structural Positions bridge broker

Selected Attributes, e.g. distinct institutional role

Table 1. Syntactical and Semantic mapping

reasons for the change in social structures had to be hypothesised in the context of the problem scenario. In order to support these hypotheses, other data such as the geographic location of the calls, had to be used in the analysis. This set of tasks provided a shared motivation for the research community to tackle this problem and allowed the challenge committee a basis for comparison. This paper reports on the research approach taken to address these tasks in an attempt to “win” the challenge. We describe the methods developed which support comparative data visualisation including methods based on social network analysis in our case study approach. The rest of the paper is organised as follows, Section 2 describes typical user tasks and measures used when analysing social networks. An overview of the tool developed is described in Section 3 which was used to assist in this analysis. Section 4 discusses the case study approach, and describes how the tool was used to analyse and solve the problem. Section 5 reviews related work and suggestions for future research directions in this area are drawn in Section 6.

2. SOCIAL NETWORK ANALYSIS A social network consists of a list of actors and a list of ties between them. Typical examples of actors are persons, organizations and countries, while ties can mean social relationships, behavioural interaction and physical connection. In a network, actors can either be of the same type, in which case the network is called a one mode network, or else of mixed types in which case it is called a two mode network. An example of a two mode network is one where some of the actors are authors and some are papers and the links between the two represent authorship and co-authorship of papers. From the definition of a social network it is clear that social networks are a type of graph with the actors being the nodes of the graph and the ties being the edges of the graph. A graph is the mathematical construct used to model aspects of a social network. However the graph is attempting to represent a real world sociological scenario. The graph, in this case, is a syntactical representation of a semantic problem. This distinction between syntax and semantics is very important and sometimes overlooked. The mathematical techniques (eg. centrality or betweenness) used to calculate measures on the graph mean very little without the social context of the problem being analysed. Each measurement and analysis technique, both visual and numeric, has to be given a valid sociological interpretation. Table 1 drawn from6 maps the syntactic attributes from graph drawings7 to the semantic attributes in social network analysis. The syntactic attributes are graph theoretic concepts such as centrality or network size, while

SPIE-IS&T/ Vol. 7243 724309-2

Element Actor

Measure

Informal Questions

Degree centrality

Who are the connectors / hubs (the people with the most connections)? Who are the brokers / gatekeepers / bridges between two different groups? Who are the people who can access the other people in the network in the least number of steps? Which actors perform the same role in the network (equivalent network class)

Betweenness centrality

Closeness centrality

Roles and structural equivalence Group Identify Groups Subgroups / k-core / n-clique

Network General characteristics

Network tion

Centralisa-

Component (strong / weak / cyclic) How are people within a group connected together? Is everybody connected to everybody else? What are the characteristics of the network, does it have very few central nodes (like search engines on the internet), is it a dense network (highly connected) or a sparse network (less connected)?

Table 2. Social Networking questions

the semantic attributes relate to the social scenario that is being analysed, such as size of organisation. While this mapping is implicit for certain aspects of analysis, it is important to note that the graph features and measures are representations and not the final goals of the study. It is not the answer or value from these calculations that form the solution but the interpretation of these calculations.8 Clearly, understanding the questions the user of the visualisation is going to ask is paramount to the development of an effective solution to help them.9 This principle applies in all areas of software development and information visualisation is no exception. Researchers in information visualisation have attempted to group user tasks into taxonomies according to the type of data in question.1 Amar et al10 and Plaisant et al4 have extended the general study by Shneiderman to include attribute data and graph data respectively. Considering the relationship between graphs and social networks the application of a graph taxonomy applied to a social network is now considered. There are three elements that are typically measured in a social network; the individual actor, the whole network, and a group of actors. Analysts try to identify who are the key actors, commonality between actors to form groups of actors, and characteristics of the whole network such as how dense (well connected) a network is. At the heart of individual actor analysis is the centrality metric, which can be divided into 3 main measures; betweenness centrality, degree centrality and closeness centrality. Group discovery should consider components, cliques and k-cores.11 Table 2 attempts to explain these measures by formulating them as informal questions that the analyst asks

SPIE-IS&T/ Vol. 7243 724309-3

about a network. The questions are chosen in a way to allow an analyst to consider the significance of the metric chosen, what it means, and how it can be applied in a sociological context. This is a small set of common measures used to analyse networks, for a thorough review of such measures see.12 All the measures in Table 2 and indeed all graph theoretic metrics can be computed without requiring a visualisation. It can be argued that a visual representation or visualisation isn’t required as the metrics can provide all the insights required. For example it is straightforward to calculate an ordered list of the most central nodes in the graph, or to divide the graph into clusters and list the clusters. Since an end-user can easily access this information by measuring the network, what benefit does visualising the network have? Visualisations of social networks can either be used to explain the network, to assist in communicating results and ideas, or else they can be used to explore the network to uncover hidden patterns. In the seminal work on social network visualisation, Moreno describes the creation of a sociogram13 as “A process of charting has been devised by the sociometrists, the sociogram, which is more that merely a method of presentation. It is first of all a method of exploration.” This point illustrates that from an early stage the realisation that visualisations are an exploratory tool and not simply a means for presentation. In any exploratory system, the interactive controls of a visualisation are important to enable the user to view different drawings of the same diagram, and explore alternatives in a quick and easy manner. A comprehensive survey of such graph based interactive and navigation techniques are given by Herman et al.14 The quantitative analysis of social networks is a well studied and developed area and visualisation is not going to replace these methods12 and indeed this is not the aim of visualisation.15 It is clear however that visualisation can help the analyst to complement the measures obtained from metrics and explore paths that might not have been identifiable by numeric analysis alone. An ideal tool combines the two analytical approaches to support all the analyst’s needs.

3. APPROACH In this research the visualisation is designed to support the analysis of the changes in network structure over a 10 day period, extracted from a list of phone calls made between actors in the dataset. The first approach was to adopt standard tools such as Pajek16 for Social Network Analysis (SNA) and Excel for analysing the attribute data. Any study involving the visualisation of social networks should start with common tools and standard visualisation methods. The initial analysis of the data was not restricted to only visual exploration techniques. A statistical overview of the dataset, complemented by social network analysis measures was completed in an attempt to identify change. While this analysis provided an indication of the most connected actors and the overall structure of the network, it was impossible to determine changes occurring over time using only these techniques. The network of actors can be considered on an epoch to epoch (minute, hour, day, 10 days) basis or across the entire day set. An attempt was made to visualise parts of the network using freely available social network visualisation tools, however at best this was very time consuming and laborious to process the data and visualise it in generic ways. In order to overcome these restrictions, and to support the exploration of a large changing network, a tool(TGD) specifically designed to support the analysis of this problem, was implemented. TGD was developed using the processing language,17 which enabled the exploration of different interactive features, and the rapid prototyping of different visualisation approaches. TGD supported two different yet coupled views of the network data, a matrix representation, and a node link representation. The matrix representation was used to visualise the overall change in network activity, for instance, to highlight an increase in activity of a particular group of actors. The node link representation was used to focus on parts of the network, and to explore the detail of the relationships between key actors, previously identified from the matrix view. This hybrid approach matches visualisation method to sub-task while supporting coupled interaction to facilitate movement between views. To aid interface discoverability the interactive features of both the matrix view and the node link visualisation are similar. The development of the interaction is guided by the overview first, zoom and filter, and details on demand, principles of Shneiderman.1 The interactive features supported by both visualisations include:-

SPIE-IS&T/ Vol. 7243 724309-4

ANIMATE DIRECTED CLEAR COLOR SORT DV GEO

EPL.ORE czooro GEO MOOR

FADE QUICK SHOW CALL DROP

FADE SLOW

1

Increese of cIIs (red

ts) in the 300 ronge

Coils of pevirots cdnening noths stort fedng by turning blue

6

6

6

- @400

Figure 1. Matrix network display during animation

• View Animation • Decay Functions • Reordering of nodes based on attribute data (matrix view only) • Zoom on section of the graph, or individual nodes

3.1 Matrix View The Matrix View was developed to support the visualisation of the entire network. Social network analysis packages such as Pajek, support the display of matrices, but are limited to the display a small number of nodes. These packages cannot handle even the 400 nodes from this data set. To overcome this problem, each edge in the matrix is represented as a 2x2 square pixel. The greatest challenge with matrix views of social networks is the difficulty in determine paths between related actors.18 For the task here this is not of a major concern. The aim of the matrix view is to simply identify actor activity over time, rather than the relationships and links between the actors. The matrix view was used as an overview, from which interesting nodes can be selected and displayed in the node link visualisation view. To support the analysis of temporal data, animation was added to the matrix view. A timeline, with hourly granularity for each day, was used to show the progress of the animation, and also to allow the analyst to view the network at a certain point in time. This interactive approach made it straightforward to visualise the data at a particular time during analysis. When animating the matrix view, the edges are added to the matrix according to the position in the timeline. Figure 1 shows a frame from the matrix view animation on day 9 at 4 am. The first group of circled edges (1) only appeared after day 8. Determining this was straightforward to notice these nodes during the animation because of the high level of new activity. On the other hand the second group of edges (2) turned blue towards day 8 after being bright red up until then. Edges in the network are weighted according to the frequency of calls between two actors. Since the network was extracted from phone calls, where information between two actors can flow in both directions, most of the

SPIE-IS&T/ Vol. 7243 724309-5

analysis was made on an undirected version of the network. The option to view the directed network based on the actor who initiated the call, was added to support specific hypothesis testing. For instance, after a potential important event was identified, the direction of the actors initiating the calls was considered important to give an indication of who are the first actors aware of the event. Each cell in the matrix representing an edge has a colour property that was dependent on the weight of the edge, i.e. the activity of the actor based on number of calls made or received. Edges which are very active turned various intensities of red, based on the number of times the edge is observed in the time window. As activity of an edge started to decrease, the colour of the edge started turning dark blue on a black background. The dark blue on the black background, while still visible, blended in the background and made the red pixels more prominent. The time it takes for an edge to start fading, is controlled by the decay function of the animation. In the visualisation the different fading functions are defined in the code, however these functions could be specified in the interface by allowing the user to draw the shape of the function curve. Different fading functions could be selected during the animation, according to the requirements of the analyst. In our data analysis 3 different fade functions are adopted; a function that decayed quickly, a function that decayed slowly, and a function that changed the colour of decaying edges into a third different colour. To determine the speed of the decay functions, the average time between calls of each actor was calculated. Different parameter settings below the average, close to the average, and above average are tested, to explore the effect of the decay function parameter on the visualisation. The decay parameter effected how quickly the second group of edges (2) in figure 1 turned blue. When setting this parameter too low most of the edges turned blue too quickly. Conversely, when setting this parameter too high most of the edges remained red for a longer period of time than is effective. The results of a study showed the best setting for the dataset was approximately the average time between actor’s calls. The third decay function tested highlighted edges that are decaying by turning them yellow. Edges that are in the process of decaying first started losing their red intensity, then turned yellow, then started turning in blue. This approach while visually appealing added little and was not useful in highlighting any new patterns. Finding patterns in matrix displays can be improved by reordering nodes in the matrix. To facilitate this pattern discovery, the option to reorder nodes using a data attribute was added. In the dataset, one useful data attribute available was geographic location. This reordering allowed the analysts to determine if people in the same location called each other at any point in time. In the context of this scenario, this pattern would have suggested that people in one location are either only calling people in their same geographic area, or more probably are only interacting with people that are not in their geographic area. The latter case might have suggested that people in the same location are meeting in person rather then calling each other. A matrix representation of a network gives the densest possible visualisation of a network without edge and node overlap. For the competition data set with 400 nodes, a 2x2 pixel square was sufficient to represent each edge. Potentially each edge could also be represented by a single pixel. This supports graphs with nodes up to the size of today’s maximum screen resolution. Multigraph multi-level approaches have explored clustering or aggregation to display clustered data views of million node graphs. The exploration of such multi-level views with temporal graph data (where the clustered may change from time to time) remains an open research question. Future work will explore such dense visualisations and analyse the challenges of interacting with such visualisations. As the aim of the matrix view is to give an overview of the whole network, it was important to allow the analyst to zoom in on edges that warrant further investigation. For this purpose the facility to highlight edges to zoom in on, in the node link view, was added. The analyst could select the edges by clicking on them and then select the zoom in feature, to display them in the node link visualisation.

3.2 Node link view The node link view allows the analyst to explore sub-graphs of the network at any point in time. To explore the network, the analyst could expand or contract a node’s neighbours and remove nodes, by clicking on the context menu of each node. Colours are added to each node to suggest which nodes might warrant further exploration. For the analysis task, 3 node colouring options are used; node (undirected) degree, node type (based on a

SPIE-IS&T/ Vol. 7243 724309-6

classification of all nodes in the network), and node geographic location. Additionally, a node could either have a fixed position, or else is allowed to move freely using the Fruchterman and Reingold19 spring embedded layout, acting on the graph or the more scaleable FADE layout algorithm.20 In order to support the visual exploration of a large network, the node link view was not used to display the entire network. Instead the user guided by the matrix representation, focusses on sections of the graph that are of interest in the problem being analysed. The node link representation is only used to display smaller graphs, after the nodes are filtered using the matrix overview. This reasoning provided the rational for using a simple fast algorithm such as the Fruchterman and Reingold for laying out the graph. Once part of the network is extracted using the network exploration features, or the matrix zoom in view, the network is animated. At the start of the animation all edges are removed, and edges start appearing as the time of the animation progresses. Originally during the visualisation design, a feint line (edge) was displayed between connected nodes prior to starting the animation. However during the analysis stage the initial edge was distracting, therefore the option to start the animation without showing existing edges is provided. In the visualisation nodes are either allowed to move freely, or else fixed on the canvas to prevent movement. Each time a new edge is added between two nodes, and any of the nodes did not have a fixed position, the nodes move closer together to emphasise their relationship. Repetitive observations of the same edge causes the width of the edge to increase up to a certain threshold, after which the edge colour starts darkening. If a long time passes between new observations of an edge, then the edge starts fading. The time it takes for the edge to fade, can be adjusted in real time by changing the decay parameter. When the edge decays it either decreases its prominence, or else gets removed altogether, depending on the animation settings. Allowing the user to fix the position of certain nodes is very important for obtaining results by visual inspection. When all the nodes are allowed to move freely during animation it is not always possible to observe any patterns in the data due to excessive movement. On the other hand when fixing certain nodes, especially nodes not central to the analysis task, it becomes easier to identify patterns. Preliminary user studies on the effect of fixing certain node’s movement, allude to its utility in discovering important patterns in the network. The dataset for the mini challenge also contains the location of cell towers, from where each call was made. As part of the analysis task, an analysts wants to understand the possibility of a group of connected people, moving in unison to the same location. Such behaviour indicates a potential meeting, or activity taking place at the location the people move to. To support this information requirement, the island’s map was underlaid on the graph drawing canvas, and a set of forces were added to the graph spring layout at key locations on the map. During the animation, each time an actor changes location, the actor is drawn towards the force represented by that location, effectively clustering the nodes based on geography. The animation speed of this visualisation can be controlled, to give sufficient time for the node to stabilise near its location. This is especially important if nodes are changing location frequently over a short span of time.

3.3 Interaction with Excel Social networks don’t exist in isolation, and social network data (nodes/edges) is typically supplemented by other non-relational data attributes. To aid the analyst in analysing a scenario, a dynamic report for each actor in the network is available in Excel. This node profile report can be accessed from the view information option of a node’s context menu. This feature is especially useful in a twin monitor setup, with the actor’s rich detail displays on one screen, and the graph in the other. All the call records made by the queried node are also displayed in the report. This approach is effective in filtering large data sets during the exploration process.

4. CASE STUDY The first visualisation component of TGD is the node link graph visualisation whose purpose is facilitating the exploration of the network. In the scenario description, the organisers specified that there was medium confidence that node 200 in the dataset was the leader of the religious organisation. This information was used to explore node 200 neighbours using the node link graph visualisation in TGD. The network of the leader and his neighbour’s neighbours, is already a network of considerable size. When this network is animated, no patterns of change in the network can be readily identified. This was partly because

SPIE-IS&T/ Vol. 7243 724309-7

Day 10 @23:00 New replacement nodes

move towards the bottom nodes they replace.

Sp.ci.I Nodn Huh

HkQhpSfty

o L*.., o L.,tD,g,..

Day 5 @ 1 BOO

Bottom nodes are the original Coordinators

Figure 2. Frames from animation of network on days 5 and 10

of the excess clutter of having too many nodes, and also because initially the call decay functions were reducing the edge size too slowly. To overcome the scalability problem with the node link diagram, a new way of displaying the whole network is explored. The solution to this problem took the form of the animated matrix view. When the matrix view is animated, a high concentration of calls, shown as red dots, is immediately apparent on day 8 (Figure 1) label 1. These new nodes are investigated further by zooming in from the matrix view to the node-link view. The node-link visualisation is then animated, and the new nodes are allowed to move freely towards the people they are calling. The new coordinating nodes move toward the original coordinating nodes, previously involved in the social network operations, but who ceased their activities when the new nodes took over. Figure 2 shows two frames from the subgraph of the 2 groups of coordinators. In the first frame nodes 1, 2, 3 and 5 are communicating with node 200, who is known to be the boss. In the second frame, taken from day 10, a new set of dark coloured nodes, that previously where positioned at the top right corner, are now positioned near the original coordinators. These new nodes communicate with the same pattern as the old nodes to a new boss, node 300. This is a key discovery that is highlighted by the network animations. With the help of this visualisation, an exact 1-1 mapping between the old and new coordinators is discovered. This correspondence is confirmed using structural equivalence measures. Another observation from this animation is that the original coordinators stopped communicating when the new people took over. This either suggested capture, a coup, or else simply a change of mobile phones. At different stages of the analysis different node attributes are used to colour the nodes. Originally various shades of a single colour were used, varying intensity to encode different ranges of degree centrality. Degree centrality was an obvious starting point, due of its general relevance, and ease of calculation using SNA tools. As new observations were made from the data, the colour encoding of the nodes was changed to include this new information. This iterative analysis and development was facilitated by the processing language. Towards the end of the analysis, after using several measures to explore the network, a number of different visual and colour encodings were explored to represent different nodes in the network. Apart from node colour, node shape and size were used to highlight important nodes. In the end each node was classified in one of 7 categories. Table 3 gives a break down of the different categories and the way they are represented. Once a good grasp of the facts from the data was gathered, possible explanations in the context of the problem scenario were sought. For example, the old coordinators in the network stopped communicating with anyone after day 8, which lead to the hypothesis that they might have been captured or killed. However this hypothesis was tested by focusing on these actors, it emerged that the actors communicated with some new people on day 10. These new people where not previously in the spotlight, so the behaviour of these people had to be analysed in more detail.

SPIE-IS&T/ Vol. 7243 724309-8

Node Type Old set of coordinating nodes

Description The activity coordinators before day 8

New set of coordinating nodes

The new coordinators who took over after day 8

Potentially important nodes in the scenario

Nodes that had particular connections that warranted closer inspection in the context of the problem. These were considered as suspicious nodes, for example nodes that only communicated with the old leader after he was isolated Nodes that don’t fall in the above categories but have a high measure of proximity

High proximity nodes

Central nodes

Nodes that don’t fall in the above categories and have a high degree centrality.

Lower central degree

Nodes with a lower degree centrality.

Lowest degree nodes

Nodes with the lowest degree of centrality.

Visual Representation Brightly colour nodes (blue) with a heavier circle edge to emphasis importance Brightly coloured nodes (red) with a heavier circle edge to emphasis importance Nodes were represented with a square. Considering these were very important nodes to watch out for, representing them with a different shape makes them stand out more.

Nodes represented with a dark colour to contrast the pastel colours of the other nodes to add visual contrast These nodes are secondary nodes so they were represented with a normal sized circle, using a light green pastel colour Represented using a smaller circle coloured in low contrast yellow Represented by very small circles coloured in light yellow

Table 3. Visual encodings of different node categories

This stage of hypothesis testing and exploration was iterative and it guided the development of the visualisation tool. For example, the geographic animation of the nodes was added during this hypothesis testing stage to check if people moving towards the same location at the same time where still calling each other or not. To visualise this scenario in the matrix view, the ability to order the nodes by geographic location was added to the matrix view. Some features implemented however were not helpful in providing insight. For example, as a first attempt at examining location change in all nodes, an animated grid view of the nodes was developed, with each node changing colour each time the node changed location. This was of very limited use as it was difficult to detect patterns over time. One had to mentally group nodes and memorize their colours over time, which is difficult. Instead, all cell towers were grouped together in 5 regions, and a time series stacked line graph was created to visualise this data. This graph showed periods of call activity, including a significant decrease in calls from sea during weekends, which suggests that people may be earning their livelihood at sea. Many of the facts discovered in the data were contradictory. This made developing one clear cut solution impossible, which was the aim of the challenge namely to explore possible answer in the face of incomplete and contradictory data. To evaluate all the facts and hypotheses, a chart is provided with a list of all scenarios considered and weighed by the supporting and contradicting evidence against each case. This structured approach to the analysis guided the development of closest hypothesis to match the facts uncovered. This exploratory case study method moving from hypothesis to feature development provides a testing framework which pushes the boundaries on both interaction and visual display. Each hypothesis tested provides questions that have to be answered with the visualisations. If the visualisation couldn’t answer the question, a new

SPIE-IS&T/ Vol. 7243 724309-9

visualisation or interaction feature must be explored. Being guided by very specific questions to develop the visualisation helped focus the development efforts to create a visualisation that can provide important insight. Each visualisation had to answer a question, therefore the quality of answer indirectly gave an indication on how effective the visualisation is.

5. RELATED WORK Support for visualizing network change over time remains very limited in traditional social network analysis tools. Commonly used SNA tools such as Pajek,21 have limited support for visualizing time based networks. In Pajek, the user has to manually click on a sequence of images, like in a slideshow, to animate a dynamic network. The biggest contribution to date, to the visualisation of dynamic network data, was made by Skye BenderdeMoll and Daniel McFarland,22 who have developed an open source software product called SoNIA (Social Network Image Animator) to visualise dynamic networks. The main focus of the tool is to provide a platform for testing and comparing different graph layout techniques, using dynamic instead of static network data. The authors define a framework for representing time based network events. They categorise different time sequences in typical network data sources, and suggest ways how time can be represented in the source data, to be used in the network visualisations. They introduce the concept of ”slicing” event data, as a metaphor to describe a network at a point in time. The slice can be either a thin slice, in which case the network is extracted for the exact point in time. Thin slices are good to query network data that contains a duration element in the network events. In a thick slice all the events in particular time window are considered. This is good for when network events don’t have duration, or have a small duration, and a time window is used to group multiple events in the network. The framework used for defining time based events in network data is scalable to various data sources. The network animation in SoNIA is created by joining together a sequence of images of the network. The node layout coordinates interpolated to create a gradual transition between the images. This approach to animation is in line with the goals of SoNIA, however, it does not allow interaction with the animation in real time. The aim of SoNIA is to explore the visualisation parameters applied to the animation of a network. Rather than being a tool for the data analyst, SoNIA is designed towards the information visualisation research community. It provides a framework for the researcher to analyse different network layouts, their suitability towards animation, and exploring layout parameters to improve animation. It might be beneficial to use SoNIA to explore alternative layout designs for animating network layouts and to include them in tools used by data analysts to explore networks. Natalie Henry and Jean-Daniel Fekete have worked extensively on combining matrix visualisation and node link diagrams. Their most recent efforts include Matrix Explorer23 and MatLink.24 Our approach of using a combined view of matrix visualisation and node link diagram was partly inspired by Matrix Explorer which allows you to explore networks using matrix view and node link views in parallel. In a more recent work, in MatLink Henry and Fekete extended matrix based displays to include path links between nodes. This interesting approach tries to overcome the problem of representing paths in matrix views.

6. CONCLUSIONS Following the encouraging results obtained to date with coupled node link visualisation, we aim to scale the application of these technique to larger more dynamic datasets containing more attributes. The possibility of using dense pixel displays can also be explored further as a way to provide a network overview with large datasets. The matrix view and the node link view can also be integrated further in a single display. There is further scope for improving the animation features of the visualisations. Currently the user can manually control aspects of this visualisation by setting parameters. There is an argument for attempting to determine these parameters automatically, to generate displays that are more likely to be of value to the user. To understand the utility of such a visualisation system, we engaged colleagues in a preliminary user study and provided them with summary findings. As interest in the problem increased, each colleague asked different questions about the social interaction of actors. The tool allows a user to answer adhoc questions, and we were satisfied that the majority of questions were answered by using the tool.

SPIE-IS&T/ Vol. 7243 724309-10

REFERENCES [1] Shneiderman, B., “The eyes have it: a task by data type taxonomy for information visualizations,” in [Visual Languages, 1996. Proceedings., IEEE Symposium on], 336–343 (1996). [2] Purchase, H., “Which Aesthetic has the Greatest Effect on Human Understanding?,” Lecture notes in Computer Science , 248–261 (1997). [3] Tidwell, J., [Designing Interfaces : Patterns for Effective Interaction Design], O’Reilly Media, Inc. (November 2005). [4] Lee, B., Plaisant, C., Parr, C. S., Fekete, J.-D., and Henry, N., “Task taxonomy for graph visualization,” in [BELIV ’06: Proceedings of the 2006 AVI workshop on BEyond time and errors], 1–5, ACM, New York, NY, USA (2006). [5] Alexander, C., Ishikawa, S., and Silverstein, M., [A Pattern Language: Towns, Buildings, Construction (Center for Environmental Structure Series)], Oxford University Press (August 1977). [6] Brandes, U., Layout of Graph Visualizations, PhD thesis, Universit¨ at Konstanz, Fakult¨at f¨ ur Mathematik und Informatik (1999). [7] Di Battista, G., Eades, P., Tamassia, R., and Tollis, I., [Graph Drawing; Algorithms for the Visualization of Graphs], Prentice Hall (1999). [8] Scott, J., [Social Network Analysis: A Handbook], SAGE Publications, second edition ed. (2000). [9] Ware, C., [Information Visualization: Perception for Design], Morgan Kaufmann (2004). [10] Amar, R., Eagan, J., and Stasko, J., “Low-Level Components of Analytic Activity in Information Visualization,” Proceedings of the IEEE Symposium on Information Visualization , 111–117 (2005). [11] Batagelj, V. and Ferligoj, A., “Clustering relational data,” Data Analysis: Scientific Modeling and Practical Application , 3–15 (2000). [12] Wasserman, S. and Faust, K., [Social Network Analysis], Cambridge University Press (1994). [13] Moreno, J., [Who Shall Survive?: Foundations of Sociometry], Beacon House (1953). [14] Herman, I., Melanon, G., and Marshall, M. S., “Graph visualization and navigation in information visualization: A survey,” IEEE Transactions on Visualization and Computer Graphics 6(1), 24–43 (2000). [15] Freeman, L. C., “Visualizing social networks,” Journal of Social Structure 1 (Feb. 2000). [16] Pajek. http://pajek.imfm.si/doku.php. [17] Processing. http://processing.org. [18] Ghoniem, M., Fekete, J., Castagliola, P., and de Nantes, E., “A Comparison of the Readability of Graphs Using Node-Link and Matrix-Based Representations,” in [Information Visualization, 2004. INFOVIS 2004. IEEE Symposium on ], 17–24 (2004). [19] Fruchterman, T. and Reingold, E., “Graph Drawing by Force-directed Placement,” Software- Practice and Experience 21(11), 1129–1164 (1991). [20] Quigley, A. J., “Experience with fade for the visualization and abstraction of software views,” in [IWPC ’02: Proceedings of the 10th International Workshop on Program Comprehension], 11, IEEE Computer Society, Washington, DC, USA (June 2002). [21] Batagelj, V. and Mrvar, A., “Pajek-Program for Large Network Analysis,” Connections 21(2), 47–57 (1998). [22] Bender-deMoll, S. and McFarland, D., “The art and science of dynamic network visualization,” Journal of Social Structure 7(2) (2006). [23] Henry, N. and Fekete, J., “MatrixExplorer: a Dual-Representation System to Explore Social Networks,” IEEE Transactions on Visualization and Computer Graphics , 677–684 (2006). [24] Henry, N. and Fekete, J., “MatLink: Enhanced Matrix Visualization for Analyzing Social Networks,” Lecture notes in Computer Science 4663, 288–302 (2007).

SPIE-IS&T/ Vol. 7243 724309-11