spreading activation over ontology-based resources from ... - Alan Dix

draft, under review, for publication status and final version see: http://www.hcibook.com/alan/papers/web-scale-reasoning-2009/

SPREADING ACTIVATION OVER ONTOLOGY-BASED RESOURCES FROM PERSONAL CONTEXT TO WEB SCALE REASONING ALAN DIX Computing Department, InfoLab21 Lancaster University, Lancaster, LA1 4WA, UK [email protected] http://www.hcibook.com/alan/ AKRIVI KATIFORI Department of Informatics & Telecommunications, University of Athens, Athens, Hellas (Greece) [email protected] GIORGOS LEPOURAS Dept. of Computer Science and Technology, University of Peloponnese, Tripolis, Hellas (Greece) [email protected] COSTAS VASSILAKIS Dept. of Computer Science and Technology, University of Peloponnese, Tripolis, Hellas (Greece) [email protected] NADEEM SHABIR Talis, Birmingham, UK, [email protected] Received (Day Month Year) Revised (Day Month Year) Accepted (Day Month Year) This paper describes methods to allow spreading activation to be used on web-scale information resources. Existing work has shown that spreading activation can be used to model context over small personal ontologies, which can be used to assist in various user activities, for example, in autocompleting web forms. This previous work is extended and methods are developed by which large external repositories, including corporate information and the web, can be linked to the user’s personal ontology and thus allow automated assistance that is able to draw on the entire web of data. The basic idea is augment the personal ontology with cached data from external repositories, where the choice of what data to fetch or discard is related to the level of activation of entities already in the personal ontology or cached data. This relies on the assumption that the working set of highly active entities is relatively small; empirical results are presented, which suggest these assumptions are likely to hold. Implications of the techniques are discussed for user interaction and for the social web. In addition, warm world reasoning is proposed, applying rule-based reasoning over activate entities, potentially merging symbolic and sub-symbolic reasoning over web-scale knowledge bases.

1

2

Alan Dix, Akrivi Katifori, Giorgos Lepouras, Costas Vassilakis, Nadeem Shabir Keywords: personal ontology; spreading activation; web-scale reasoning; context modelling intelligent user interface; personal information management, warm-world assumption

1. Introduction A typical computer user may have hundreds of address book contacts, many thousands of files, and probably tens of thousands of emails. However, having this wealth of data stored does not help users in performing their tasks with the computer unless it is available when needed. Often this involves navigating one’s way round a file system, or searching email folders for an elusive address or telephone number, or simply retyping information that you know is there somewhere. The aim of the research that gave rise to this paper is to support users so that they have the right information available at the right time. Ideally the computer should be able to perform like an efficient personal assistant combining computer-like power and memory with the human-like understanding of the individual and the context. For example, if a user is filling out a web form, then an ‘address’ field would by default prefill to their own address, but if they have just had an email from a friend, then we might expect the friend’s address to be suggested alongside or ahead of the user’s own address. One of the techniques we have been using to tackle this is spreading activation over personal ontologies [1]. In short, this involves taking a populated ontology of the user’s personal information including personal profile, relationships to colleagues, projects, etc., and then applying a spreading activation algorithm over the nodes. In the above example, the email from the friend would initially excite the node in the personal ontology representing the friend; this would then spread some activation to neighboring nodes so that the friend’s address would also become ‘hot’. Later when the user comes to the web form the ‘hottest’ address in the populated ontology would be presented first as an option for the ‘address’ field, which would be the friend’s address as required. Our initial work in this area has proved promising and is reported elsewhere [1], but has been restricted to information held within the user’s personal ontology. However, not all of the relevant information will be stored locally. Imagine if the data in the personal ontology simply contained ‘Person(“Akrivi”) lives_in City(“Athens”)’. This would be fine if the web form asked for a city, but if the required field was ‘country’, it would have no contextual suggestions and would simply have to default to the user’s own country. A human assistant at this point would simply use their general knowledge and suggest “Greece”, or, if the town or city was less familiar maybe Google it. An automated system in principle may have the entire web available and in particular the web of ‘linked data’ [2,3] (interlinked computer-readable information based on semantic-web technology), so may be able to make use of ‘general knowledge’ and eternal data in the same way a human might. This raises the question as to whether the kinds of reasoning we have applied to personal ontologies can be extended to the entire web without needing to suck the whole of the web into a single machine.

Spreading Activation Over Ontologies: From Personal Context To Web Scale Reasoning

3

This paper presents methods to scale spreading activation to allow web-based reasoning, based on dynamically identifying a relatively small, but appropriate, ‘working set’ of entities and relations. We have not as yet integrated these algorithms into our inference system, but instead present data from our own system and the literature to validate the assumptions on which our scaling algorithms are based. While other proposals for web-scale reasoning are focused on symbolic reasoning (e.g. [4,5,6]), we are adopting neural-inspired sub-symbolic processing, in the sense that in the presented algorithm ontology concepts and relationships between concepts can be considered analogous to neurons and synapses, respectively. Given the success of Google page rank, there is prima facie case for the efficacy of these kinds of algorithms. However, with web-scale reasoning, many of the distinctions between symbolic and subsymbolic reasoning begin to break down; once we hit web scale even symbolic reasoning may have to become approximate and defeasible [4]. We propose ways in which our use of spreading activation can be combined with more symbolic reasoning. By only performing the symbolic reasoning over sufficiently ‘activated’ information, we allow bounded reasoning within unbounded data, in a manner similar to human reasoning. We call this the warm world assumption. The next section reviews a number of key concepts with relevant literature: personal ontologies, task-based interaction, spreading activation, sources of web data and webscale reasoning. Section 3 describes the current implementation of spreading activation over a personal ontology. Section 4 introduces methods to extend spreading activation to the web and the following section presents empirical data supporting the key assumptions. Finally, Section 6 discusses a number of issues raised and in particular the means by which web-scale spreading activation can be combined with symbolic rules to give warm-world assumption reasoning. 2. Background and Concepts In this section we review relevant concepts from the literature. We begin with personal ontologies, which are state-of-the-art tool for modelling and reasoning over personal context. Then we briefly discuss task-based interaction, which is a very active research topic exploiting personal context and has been used as a proof-of-concept application for our work, and spreading activation, which has provided the inspiration for the main algorithm presented in the paper. Finally, we overview the sources of web data, which can be used to enrich personal ontologies, promoting the latter from local-scale to webscale. We conclude this section with an overview of other web-scale reasoning approaches. 2.1. Personal Ontologies According to [7], an ontology is an explicit specification of a conceptualization. The term “conceptualization” is defined as an abstract, simplified view of the world that needs to be represented for some purpose. It contains the concepts (classes) and their instantiations

4

Alan Dix, Akrivi Katifori, Giorgos Lepouras, Costas Vassilakis, Nadeem Shabir

(instances) that are presumed to exist in some area of interest and their properties and relations that link them (slots). Using an ontology to model semantics related to the user personal domain has already been proposed for various applications like web search ([8,9]). Most of these approaches use ontologies only as concept hierarchies, like hierarchies of user interests, without particular semantic complexity, as opposed to our approach which incorporates the full range of ontology characteristics. The value of ontologies for personal information management has also been recognized and there is on-going research on incorporating them in PIM (Personal Information Management) systems like OntoPIM [10], GNOWSIS [11] and the semantic desktop search environment proposed in [12]. However there are very few detailed works available on the exact personal ontology to be used for such an application. The Personal Ontology used in our work constitutes an extended and enriched version of a user profile maintained by most applications as it attempts to group under one structure the user personal information, contacts, interests, important events, etc. More details on the creation of the personal ontology may be found in [13] and [14]. The ontology, along with example instances may be found in [15]. The personal ontology attempts to encompass a wide range of user characteristics, including personal information as well as relations to other people, preferences and interests. To be as complete as possible the ontology has drawn on existing de facto standards such as FOAF and vCard as well as proprietary profiles such as Facebook. However, we do not expect this to be final or complete, so we foresee evolution of the base ontology, but more important the ontology may be extended through inheritance and the addition of more classes, as well as class instantiation according to the needs of both user stereotypes or individuals. The addition of weights on classes, instances and relations has been the final step to make the personal ontology ready for use and testing within our spreading activation framework. 2.2. Task-based Interaction While personal ontologies can help users organize and manage their information, in everyday interaction a user is not directly concerned with the information management but rather she is interested in performing tasks. To this end, user interaction support should be structured around the tasks a user executes. As illustrated in [16], in order to perform a task a user carries out certain action(s) using data related to the task’s context; in this work, context is about “What to Do and What to Do It to”, including thus the task that the user is involved in (e.g. reading a mail, filling in a web form) and the data involved in the task (e.g. the e-mail sender, entities referenced in the mail message body or fields and field values in the web form). Although most actions are performed on the user’s own PC (albeit some being functions offered by locally installed application and some from web-based applications), data can come from a variety of sources, including user-owned devices such as PC, PDA and mobile phone or web-stored information.


5

Currently, to achieve their goals users resort to searching for, retrieving, copying and pasting the necessary information between applications and locations. In web-based applications, browsers can perform a level of automatic form filling using a combination of URLs and named fields. Research systems, including that of the Simplicity project [17] and W3C draft “Client Side Automated Form Entry” [18], have extended this to include mappings between specific form’s field names and user profile data. Our own work has gone beyond automatically filling-in fields by name or basic types; in related work with colleagues, we have shown how rich ontological type tags such as “name_of Friend” can be automatically inferred over an unconstrained personal ontology and furthermore how they can be linked across a single form, or multiple forms in subsequent interactions. For example, if a form contains both name and city, then after a single example this may be automatically tagged as “name_of Person p” “location_of Institution employing Person p”, connecting the two fields, so that when the name is filled the location can be auto-completed [16,19]. This is useful when there is a functional relationship between fields (e.g. address of a person already entered), but does not help with a first empty form (whose name?) or where there are alternatives (home address or work address). To offer the appropriate data during the user’s interaction, the system has both to identify user actions as they carry out their tasks and to understand the context of the actions. The "what to do" part of the context, the fact that you are in the middle, say of booking a hotel room, is tackled by sequence/task inference techniques described elsewhere [16]. In this paper, we are interested in the "What to do it to" part, the initial name field or the choice between alternatives. For this we employ spreading activation (as described in section 3) as a means to predict context of actions and present via a drilldown technique the relevant data and possible actions that can be performed upon the data. 2.3. Spreading Activation Spreading activation was first proposed as a model of cognition [20], but is not a new concept in semantic networks related computational research, where there are a number of proposed applications of spreading activation, especially in the area of information retrieval [21]. Crestani [22] proposes the use of spreading activation on automatically constructed hypertext networks in order to support browsing within these networks; in this case, constrained spreading activation is used in order to avoid spreading through the whole network. The work in [22] presupposes that semantics and weights have been assigned to the links within the hypertext network, possibly in an automatic/semi-automatic fashion, this however is infeasible at web scale. Liu et al [23] use spreading activation on a semantic network of automatically extracted concepts in order to identify suitable candidates for expanding a specific domain ontology. Xue et al [24] propose a mining algorithm to improve web search performance by utilizing user click-through data. Weighted relations between user queries and selected web pages are created and

6


spreading activation is performed on the resulting network in order to re-rank the search results of a specific query, allowing also faster incorporation of newly generated pages into search results by building similarity sets. While this approach may improve the efficiency of searches, it offers only results at document granularity, whereas in a number of applications, including task-based interaction, entity-level granularity is far more useful. Besides, only terms appearing in user queries are considered, which do not necessarily cover the full breadth of ontological resources, especially when the scope of the application is a single user’s interaction. Hasan [25] proposes an indexing structure and navigational interface which integrates an ontology-driven knowledge-base with statistically derived indexing parameters, and the experts' feedback into a single spreading activation framework to harness knowledge from heterogeneous knowledge assets. While the authors mention the existence of a local learning rule which modifies the weight on links directly or indirectly involved in a query process, no further details are provided for this; moreover, in [25] expert need to provide direct feedback for adapting the network weights, and no method for linking to external (web) sources is provided. Finally, the discussion on scalability is limited to how new documents can be incorporated to the system, implying that all information can be hosted in a single computer. It is also worth noting that although the works [22], [24] and [25] consider spreading activation, they do not deal with the different timescales of memory. [22] refers to “some form of activation decay” that may be included in the (optional) preadjustment or postadjustment phases; [24] includes a decay factor; and [25] includes an activation retention threshold for the same purposes. However, these provisions only model how importance of items is lost, and do not capture the notion of the “current task”. Neural networks and in particular Hopfield Networks [26] attempt to approach and simulate the associative memory again by using weighted nodes but at a different level. In this case, the individual network nodes are not separate concepts by themselves, but rather, in their whole, are used to represent memory states. This approach corresponds to the neuron functions of the human brain and mainly focuses on the storage of memories, whereas ours attempts to simulate the human memory conceptual network functions and focuses on the representation of activation of individual concepts. Recently, spreading activation theory has been recognized as a candidate approach for supporting personal interaction with the system, in the newly emerging areas of personal information management (PIM) and Task Information Management (TIM). This work has been published in [27] and [1] and is summarized in section 3. 2.4. Sources of Web Data: Linked Data and the Semantic Web In the scenario in Section 1, the human assistant would either just ‘know’ that Athens is in Greece, or, if not, Google “Athens” to find out. Of course while the web is full of human-readable information, much of this is unavailable for automated reasoning. The goal of the Semantic Web is to change this [28] and make a ‘web of data’.


7

While some use of semantic web technology is effectively still in vendor specific ‘silos’ either private or using bespoke ontologies, there is a growing body of ‘Linked Data’ [2,3], that is web services that use semantic web technology (RDF, SPARQL, usually REST-ful), but also use interlinked ontologies so that entities in one can be linked to those in another. Figure 1 shows some of these sources, for example DBpedia, which extracts the data in Wikipedia ‘info boxes’ and turns it into RDF data, and Geonames, which does the same for geographic information from a number of sources.

Figure 1. Linked data on the web (from [2])

In practice, this interlinking is not quite as easy as the figure suggests as some data is partial (e.g. in the DBpedia data for ‘Athens’ the word ‘Greece’ is mentioned, but is not linked to a semantic ‘Country’ entity), and while classes and relations are common through shared ontologies, the same entity (e.g. the City ‘Athens’), is typically represented by different URIs in different data sources, so some resolution and mapping is needed. As well as these core Linked Data sources there is an even greater volume of data with public APIs, including data storage sites such as Freebase [29] and Google spreadsheets [30]. In some cases (e.g. Freebase) these have representations that either use ontologies, or have similar form, but do not use standard ontologies and thus are, in principle, harder to interlink with other data sources. It is likely that many of these will adopt Linked Data philosophy over time or that wrappers will be constructed by third parties, so that these offer a larger potential source of data. Finally, while much of the web is designed to be human readable, this does not mean it cannot be accessed automatically. It is estimated that the vast majority of webaccessible information is ‘hidden’ in backend databases, but only available through

8


bespoke web forms. This is variously called the invisible web, the hidden web, or the deep web [31,32]. Hopefully over time more of this will also become available either through the owners adding RDF interfaces, or through third party wrappers, or through automatic means [33,34]. Even for ‘ordinary web pages, it is suggested that around half the material is within some form of template [35] and text-mining techniques can enable semantic information to be extracted even from plain text [36]. Even where analysis of single pages is ambiguous or unclear analysis of large numbers of documents may yield more reliable information, as with Google Sets [37]. It is not yet clear whether the future of the web will be a pure semantic web approach of URI-linked data or one of more diverse data sources linked through wrappers and mappings. However, either way, for the purposes of this paper, we will assume that the data available acts like pure linked data. It may be that some of the entity linkages are inferred through mappings and rules, but if so we assume that this has happened prior to loading into a local graph. 2.5. Web-Scale Reasoning There are a number of proposals for large-scale web reasoning including work emerging from the EU Large Knowledge Collider project (LarKC) [38]. Fensel, van Harmelen and the LarKC team propose a sampling based approach [3,39], which like our own work leads to defeasible reasoning using partial information. As in our approach they assume that bounded rationality [40] is essential when reasoning over very large knowledge bases. do draw a sample, do the reasoning on the sample; if you have more time, and/or if you don’t trust the result, then draw a bigger sample, repeat Figure 2. Sampling-based reasoning (from [3])

The concept of bounded rationality was originally introduced by Simon to describe the way we as humans think about the world [40], neither waiting until we have all the relevant information not even fully deducing all the logical consequences of our knowledge, but instead acting on partial information and partial reasoning. The explanation for this is not laziness or ‘poor’ reasoning, but necessity and efficiency: gathering information and thinking about it are both expensive, taking time and effort, and are typically not worth the additional gains. Elements of bonded rationality are found in many algorithmic approaches developed since the 1970s including simulated annealing, neural networks and genetic algorithms, which all aim to produce ‘good enough’ results for reasonable costs. In contrast to these nature-inspired algorithms,


9

LarKC effectively mixes traditional deterministic reasoning with statistical methods. This kind of analysis has proved successful in various fields including primality testing in cryptography [41] and model counting [42]. Anadiotis et al. [5] look at a peer-to-peer architecture proposing that queries are broken into portions and distributed to the servers maintaining the relevant data based on the ontologies used in data at the servers. Proposals for semantic web pipes [43] and stream reasoning [44] similarly envisage being distributed across servers. The augment API in Talis’ semantic web application platform [45] operates in a similar fashion allowing queries generated in one triple store to be augmented with knowledge from another. These approaches have some similarity to Google MapReduce [46], which has successfully applied functional programming techniques to very large-scale data processing. MapReduce assumes homogeneous and replicated information; however other forms of web-scale reasoning including stream and pipe approaches assume heterogeneous stores where the problem is knowing which stores have the required information and potentially localising computation to the relevant stores. The issues of ontology authoritativeness and reasoning scalability are discussed in [47], where a rulebased forward-chaining reasoning scheme is adopted. While many of the systems and proposals for web scale reasoning are focused on distributing the reasoning or computation, some, including our own approach, assume a single reasoning engine drawing in information as required. The OBII semantic web query answering system works in this manner [6,48]; it has a repository of meta information about data sources and ontology maps to deal with disparate ontologies, then draws in information from different sources as required for the query being processed. Arguably the most successful form of web-scale reasoning is Google page rank [49], which is effectively using a form of sub-symbolic reasoning. In fact, the simple page rank algorithm operates in a very similar way. Page rank uses linear spreading of 'rank' between web pages leading to a single stable global pattern of rank. In contrast, spreading activation is attempting to create a pattern of activation dependent on the initial activation, and so uses non-linear functions to prevent 'capture' by the eigenvectors of the linear approach. However, despite these differences, the success of Google page rank certainly suggests that other forms of sub-symbolic reasoning have potential. 3. Spreading Activation over Personal Ontologies Having at hand a personal ontology that captures the entities of interest to the particular user and the relationships among them, we can simulate the spreading activation procedure to identify the entities that can be of interest to the user in a particular context. The basic idea is as follows: when the user performs an activity, some entities in the personal ontology may be referenced in the context of this activity – e.g. when the user reads an e-mail, the sender, other recipients of the same e-mail, or a project whose name is cited in the e-mail body are such candidate entities. These entities are said to receive immediate activation; afterwards, through the relationships established in the ontology,

10


part of this activation can be spread to other connected entities within the ontology. When the algorithm completes, entities that have received a sufficiently high activation (above a certain threshold or the top-k ones) can be considered as the most prominent candidates for the user to perform subsequent activities on (e.g. reply to the e-mail; open the correspondence file with another recipient; or go to the project’s document repository). In this respect, spreading activation can be employed in the context of Task Information Management (TIM) to provide context inference to tools that support TIM. In the following paragraphs we will briefly discuss how spreading activation can be applied on personal ontologies. 3.1. Timescales in Human Memory and User Interaction Although the mechanisms of human memory have not been fully decoded yet, a number of relevant theories have emerged that explain different aspects of its structure and operation. A prominent model has been proposed by According and Shiffrin [50], according to which there are two distinct memory stores: short-term memory (known also as working memory), and long-term memory. Short-term memory corresponds to the things we are currently thinking about, and expires after a brief time period (10-30 secs), while its capacity is also limited (5-9 chunks) [51]. Long-term memory, on the other hand, corresponds to things we have learnt and remain for an indefinite amount of time (possibly for ever). Its capacity appears to be almost limitless, and items in it are organized mainly in terms of semantics, accommodating however procedural knowledge and images. Recent studies have proposed an additional intermediate memory store, termed long-term working memory [52] or mezzanine memory [1,53], storing information regarding the current situation. Similarly to these three levels of memory, we may identify three timescales regarding the user interaction with a system: first, we can consider the full set of items that are of interest to the user, and have been modelled in the user’s personal ontology; these items roughly correspond to the human long-term memory. Second, we can consider the items involved in the current activity of the user (e.g. items in the e-mail currently being read), which roughly correspond to the working memory. And, finally, we can consider items involved in recent history of the user’s activities, which roughly correspond to the longterm working memory; these may provide a broader context of the user’s activities, e.g. if a user reads an e-mail regarding an upcoming project meeting in London and then visits an airline reservation site to book a flight to London, then both activities can be contextualized as part of a more generic activity related to the project (participating in a project meeting). One additional thing that must be taken into account is that not all items are equally important to the user: for instance, when considering long-term memory, one’s own address is more important than the address of the plumber; analogous differences in entity importance can be also observed for short-term and medium-term memory.


11

3.2. Accommodating Spreading Activation Information in a Personal Ontology In order to reflect which entities are currently active in a certain memory/interaction level of the user and the perceived importance of each such entity, we should extend the personal ontology model to include this information. To this end, each entity within the personal ontology includes the following additional properties: • STA (Short-Term Activation), indicating that an entity is currently active • MTA (Medium-Term Activation), indicating that an entity has been recently active (and could also still be) • LTA (Long-Term Activation) to things that are important to the user in the long term. All the properties above acquire numeric values to indicate how important the particular entity is deemed in the respective memory/interaction level of the user, while the value of zero for a specific property indicates that the item is not present within the particular memory/interaction level. We will also use an additional “trigger activation” property, IA (Immediate Activation), corresponding to the things that are in some way important directly due to the current task/interaction; for example, the ontology entities (classes and instances) that are recognized in the currently viewed e-mail or web page. This property will facilitate the operation of the spreading activation algorithm, described in the next sub-section. In order to accommodate these properties (STA, MTA, LTA and IA – as well properties IN and MAXLTA which will be discussed later) in all ontology instances, we have extended the definition of the template class (STANDARD-CLASS Protégé [54]) to include these properties, and from there these properties are inherited to all ontology instances. All additional properties are of type float. In order to simplify the presentation of the spreading activation algorithm, we will consider that the inverse of each relation is explicitly recorded in the ontology schema e.g. if the ontology includes entities “John” and “Mary” and these are connected with the (directed) relationship father(“John”, “Mary”), then the ontology also includes the directed relationship daughter(“Mary”, “John”). In the implementation of the algorithms, activation is spread in both directions through a relationship even when there is no defined inverse, but for the sake of exposition, we assume that both are there. We will also consider that relationships bear a weight (or strength) LTW, which is directional, allowing different weights depending on which direction the relation is traversed. LTW is again accommodated in all relationships within the ontology, by extending the respective template class in Protégé, namely STANDARD-SLOT. The newly introduced properties listed above (STA, MTA, LTA, IA, IN, MAXLTA and LTW) describe the spreading activation-related aspects of the ontology elements, constituting effectively meta-information for these elements. 3.3. Spreading activation algorithm The spreading activation algorithm operates on the personal ontology, as enhanced with the properties STA, MTA, LTA, IA and LTW listed in sub-section 3.2, and includes rules

12


for updating the activation levels of the entities within the ontology. Updating includes passing activation between shorter-term and longer-term memories and modelling decay of memories, in the absence of triggering activations. In the algorithm description and discussion presented below, we will use the following notations: • IA(e), STA(e), MTA(e), LTA(e) are the instant, short-term, medium-term and longterm activation levels of a particular entity e. • For a particular relationship r, we will denote as LTW(r) the weight of the relationship (i.e. its perceived importance). We will also denote as LTW’(r) the value of LTW(r) divided by the number of entities to which r points to (the fanout factor of r). For example, if r is the relationship “member state” between the entity “European Union” and the entities corresponding to countries, LTW’(r) = LTW(R)/27, since the relationship connects entity EU to 27 other entities. The basic steps of the spreading activation algorithm (summarized in Figure 3) are as follows: First weights are computed for relationships determined largely by fan-in/fan-out and also those entities with initial activation (IA(e) > 0) are added to an 'Active Set'. Then a number of iterations are performed calculating the short term activation of each entity (STA(e)) based on spreading from the 'Active Set'. The precise formulae used for this are described in section 3.3.1 At each iteration, any entities with sufficiently high activation are added to the 'Active Set'. The termination condition for this process is discussed in section 3.3.2. Finally, if the activation of any entities is sufficiently high the long-term and medium-term activation (MTA and LTA) are updated. 1. Initialize appropriate weights and activations 2. Create a set with the currently active entities (entities e with IA(e)> 0), Active Set 3. Repeat Compute STA(e) for the entities in the Active Set as well as their related ones For the related entities whose STA exceeds a threshold, add them to the Active Set Until 4. Update MTA and LTA activation weights if appropriate Figure 3. Basic Outline of the Spreading Activation Algorithm (from [1])

3.3.1. Updating short-term activation The short-term activation for a specific entity stems mainly from the following two factors: the first is the direct appearance of the entity in the current task/interaction (e.g. its presence in the e-mail just read), corresponding to IA(e). The second factor is the


13

entity’s relationship to other entities that are currently in the short-term memory, e.g. when a scientist considers a paper he has authored (and thus the entity corresponding to the paper has a high STA), the entities corresponding to the paper co-authors or the forum the paper has been published in become active. The second factor will be termed incoming activation and will be denoted as IN(e); we compute it through the formula IN(e) = ∑ [LTWʼ(r) × STA(eʼ)], where the sum is over every entity eʼ connected to e via a relation r in the ontology

This effectively states that the incoming activation for an entity e is derived from the entities e’ that are related to it, and are currently in the short-term memory. Each such entity e’ contributes to IN(e) proportionally to the strength of the relationship between e and e’. Besides IA(e) and IN(e), the computation of STA(e) should take into account the importance of e in the current task context [corresponding to MTA(e)] and the overall importance of e [i.e. LTA(e)]. Combining all the above, STA(e) = S(f (IA(e), IN(e), MTA(e), LTA(e)))

Function f must count IA(e) strongly, since entities directly referenced in the current task are the most active ones in short-term memory. Moreover, MTA(e) and LTA(e) should be taken into account only if either IA(e) or IN(e) is non-zero. This last requirement is to ensure that the eventual activation is determined by the initial activation. If MTA and LTA were too strong they could swamp the effects of the initial activation leading to a stable, but undifferentiating activation. Thus, one of the simplest plausible choices for f would be: f(ia, in, mta, lta) = (A × ia + B × in) * (1 + ( C × mta + D × lta))

The result of function f is passed through a sigmoid function [55]:

The sigmoid serves to emphasises the difference between large and small activations and caps the largest. The equation for STA is recursive and is applied on the set of activated entities of each step. 3.3.2. Terminating spreading of activation Since spreading activation is by nature recursive, a termination condition must be established to break the recursive step. The two most prominent options are: (a) to apply the recursive step until the ontology reaches a stable state. Note that since the ontology contains loops (recall that for any relation its inverse also

14


exists, forming thus a loop of length two; additional loops will also exist in the ontology), we cannot expect that at some step all activation transfers will be zero. Thus, we consider a state as stable when all activation transfers in some step of the recursion fall below a certain threshold thstable, which can be defined either as an absolute value (e.g. δ(STA(e))< 10-4, where δ(STA(e)) denotes the increment of STA(e) in a particular step of the recursion) or as a ratio of the computed increment divided by the current value of the receiving entity’s STA (e.g. (δ(STA(e))/STA(e)) < 10-3). (b) to apply the recursive step for a specific number of iterations (e.g. 20). In [1], constrained spreading activation (option b) has been followed, as also suggested in [56]. In section 5 we will discuss reasons to suggest that the ontology graph is a 'small world'. That is the distance between any two entities is likely to be small, where 'distance' as measured by the number of relationships traversed to get between them. If this is the case, then only a relatively small number of iterations are needed to ensure that activation could spread right across the graph. In experiments reported in section 5 on large ontologies (millions of triples), we observed informally that there was little change in activation levels after 10-20 iterations. While the termination condition is about how long to continue with the spreading activation, later in this paper (section 4), we will discuss how thresholds can be used to limit how far the activation spreads through an ontology. 3.3.3. Updating MTA and LTA In the algorithm presented in section 3.3, after the loop that computes and updates STA, MTA and LTA are updated. MTA is incremented if STA exceeds a certain threshold: if (STA(e) > thresholdSTA) MTA'(e) = MTA(e) + δMTA and similarly for LTA: if (MTA(e) > thresholdMTA) LTA'(e) = LTA(e) + δLTA For a complete discussion on the how the thresholds of STA and MTA are set, as well as how the values of δMTA and δLTA are derived, the interested reader is referred to [1]. While the provisions above cater for incrementing the values of MTA and LTA, we must also include provisions for their decay, i.e. their value should be decremented when the entities are not active for a period of time. The mechanisms for their decay are considered differently, due to the different nature of medium-term and long-term memory (the human capacity for dealing with different subjects in a period of time is limited, as opposed to the almost unlimited capacity of long-term memory). To model the limited capacity of medium-term memory, we define a constant MaxMTATotal to represent the maximum value for the sum of all MTA weights in the ontology, and the following process is performed every T steps:


15

1. The total amount of MTA increase over the T steps, sMTA, is recorded 2. We set λMTA = sMTA / MaxMTATotal as the decay factor 3. For every entity e, the new MTA is computed: MTA’(e) = (1 – λMTA) * MTA(e) Figure 4. Process to decay MTA, performed every T steps

The number of steps T after which the decay process should be performed, as well as the value of MaxMTATotal should be set after taking into account the needs of the application at hand. Regarding the decay of LTA, we should consider that LTA reflects the long-term importance of entities, it should be ascertained that the decay does not result in important things having their LTA value gradually returning to zero. This can be achieved by introducing a rule that the LTA of an entity never decays to less than a percentage (n%) of its maximum value. Thus, we denote as maxLTA(e) the maximum LTA value an entity e has ever received. Additionally, we introduce two constants, λLTA as the decay constant that depends on the time interval between each decay and minPerc as the minimum percentage of the entity maxLTA value that the LTA of an entity may reach when decayed. The LTA decay is computed using the following process: At the designated time points, for every entity e: if (LTA(e) > maxLTA(e)) {maxLTA(e) = LTA(e)} minLTA_e = minPerc * maxLTA(e); deltaLTA_e = λLTA * (LTA(e) - minLTA_e) newLTA_e = LTA(e) - deltaLTA_e if (newLTA_e >= minLTA_e) LTA(e) = newLTA_e else LTA(e) = minLTA_e Figure 5. Process to decay LTA

Note that the amount that LTA is decremented by (deltaLTA_e) is proportional to the difference between the current value of LTA(e) and the minimum allowed value for LTA(e), thus LTA dropping rate is smaller when the current value approaches the minimum value and higher when the value of LTA has been significantly incremented in the recent past (and has not been refreshed). 3.3.4. Dealing with Relation Weights Relation weights are a very important issue in the spreading activation framework, since they play a dominant role in computing the entities’ incoming activation IN(e). We can consider three levels of relation weights, which play a part in regulating the spreading of activation between entities:

16


1. The relation as a whole, which is expressed by the relation’s Long-Term Weight – LTW (e.g. the “friend” relation will have a higher LTW than the “acquaintance” relation). 2. Weights on a particular instance of a relation, that is for a specific e1, e2 with a relation r between them, we could assign a weight dependent on: • An a-priori choice of the user – e.g. if there is a “friend” relationship, the user could assign higher weights to “better” friends. • Whether the relation was important in spreading activation • Whether both e1 and e2 have received high activation during some period. 3. Weights on the relation for an individual entity. We can quantify this through LTW’(r), defined in section 3.3, which arranges for “splitting” the spreading of the activation through a particular relation to all entities it connects. This is a more coarse-grain option for computing the weight of particular relation instances, as opposed to option (2) which considers individual instances separately. Similarly to activation levels, relation weights can also be adjusted; these adjustments will reflect the observations on how often entities connected through the relationship become active together. An approach for updating LTW of relations is presented in [1]. 3.3.5. Effectiveness of Spreading Activation A preliminary evaluation has been conducted on the spreading activation algorithm as described above, to verify its effectiveness. The preliminary evaluation included 37 tasks, and within each task specific entities were stimulated through immediate activation. Then, users were asked to classify the entities proposed (i.e. received an STA value of 20 or greater) by the spreading activation algorithm into one of the following categories (a) relevant and useful, (b) relevant but not useful and (c) irrelevant. Users were also asked to designate whether some ontology entities were important in the context of the current task and were not proposed by the spreading activation algorithm. The results of this preliminary evaluation are as follows, while for more details on the experiment, the interested reader is referred to [1] and [57]: • 59% of the proposed entities were characterized as relevant and useful. • 33.3% of the proposed entities were characterized as relevant but not useful. • 6.1% of the proposed entities were characterized irrelevant. • In 14 of the sub-tasks, 1 entity identified by the user as important was not proposed, whereas in 4 sub-tasks 2 important entities were not proposed. In the remaining 19 sub-tasks all important entities were proposed. Measuring the effectiveness in terms of the standard information retrieval metrics, namely precision and recall [58], recall ranges from 78% (two sub-tasks) to 100% (19 subtasks) with an average of 94%. The minimum precision value encountered was 68% (one subtask), while in two other tasks the obtained value was 78%; in other subtasks precision values ranged from 82% (two subtasks) to 100% (19 subtasks), with an overall average of 92%. Finally, the f-measure (a combined metric involving both precision and recall) ranged from 75% (one subtask) to 100% (11 subtasks), with an average of 93%. Since, however, the proposed approach was evaluated in this experiment not as a generic


17

information retrieval support infrastructure but rather as an underpinning for assisting user tasks, we should probably calculate the precision and recall metrics by considering as “relevant documents” the results that are both relevant and useful, since these are the ones that are bound to assist the user task at hand. Under this approach, precision ranges from 40% to 78% with an average of 59%, while recall ranges from 67% to 100%, with an average of 91%. Finally, the f-value has a minimum of 53% and a maximum of 88%, with a mean value equal to 71%. Results are thus promising, however a more thorough evaluation and an elaborate parameter tuning are underway. Performance-wise, the STA computation step as well as the MTA and LTA update steps were performed in less than 3msec on an ontology containing 75 classes and 214 instances; therefore -at this ontology size– it is feasible to perform the STA computation and MTA and LTA update steps almost after every user activity and propose to the user prominent entities and/or activities. 4. Web-Scale Spreading Activation Spreading activation can, in principle, involve work over the entire ontology, and certainly any part of it. For small ontologies this is not a problem but for larger ontologies the cost is, in worst case, proportional to NxDxR where N is the number of entities in the ontology, D is the average degree of connectivity for an entity (number of relation instances involving the entity) and R is the number of spreading activation iterations. For the personal ontologies we have been considering so far, this is not a problem as all the information has been explicitly entered by the owner of the ontology and is thus relatively small. Indeed optimizations have not proved necessary, as simple sequential passes have been fast enough. The hand-crafted part of the ontology will grow over time, but probably slowly enough that it can always be dealt with in-memory and with relatively straightforward algorithms. However, this hand-crafted part of the personal ontology is just the core linking to further personal resources on the user’s desktop (files, emails) and in the user’s web-based services (Flickr, del.icio.us), and in addition to external resources for workgroup or corporate information, and ultimately to the whole web. For the former, personal sources of information, it is reasonable to assume that a complete meta-information may be gathered into some repository on the user’s own machine, as is done in various semantic desktop projects [10,11,59]. However, even then it is likely that the size of the ontology will be greater than can fit in main memory. More critically, as we consider shared information both corporate and full web, we have to assume that the majority of the information is not only external to the user’s personal machine, but is so large that it could never be. N is effectively unbounded. We will look first at the simpler case when the ontology is large and in-memory and then use this to consider the more complex case where we wish to use spreading activation for ontologies, such as the web, where the complete ontology is too large

18


4.1. Limiting spread in large memory-resident ontologies If N is very large and the ontology is cliquey then the costs of spreading activation may not be as large as NxDxR and instead be NrxDxR, where Nr is the number of entities reachable in r steps from the original activated entities. Note, at each step Nr is the maximum number of entities that can have non-zero activation as the rest will not have been ‘touched’ yet by the spreading. However, most ontologies will be ‘small worlds’ and so Nr will be close to N for relatively small r. So, we need to artificially introduce limits. Threshold-based limit – A threshold can be imposed on spreading steps; that is only spread outward if the activation at an entity exceeds a certain threshold t. This will have a significant effect as the time becomes bounded by Nr(t)xDxR where Nr(t) is the number of entities with activation exceeding threshold t after r iterations. With a small, but nonzero threshold it is likely that Nr(t) is significantly smaller than N and, in particular, will only scale slowly as the ontology gets larger (we will examine this assumption further in section 5.) Note that this requires keeping track of all activated entities if we are to avoid linear searches of all entities. However, the linear scan may turn out to be faster until N is very large. (The linear scan would be O(N) whereas some form of activated nodes list would be O(Nr), but the time per iteration for the latter would involve something like creating a linked list, whereas the former would be simply scanning for entities with high activation.) Cap-based limit – A variant on this would be to choose a fixed n and only spread from the n most activated nodes. This has the advantage of establishing a cap on time per iteration, but does mean keeping a list of entities part-sorted by activation, that is, at worst, an extra O(n x D + n x log(n x D)) cost per step. If n < Nr / log Nr this will still be cheaper, but anyway the sorting does not have to be perfect, so actual cost is likely to be smaller. Note that adding a threshold or cap changes the semantics of spreading, the results will be similar but not identical to spreading without a threshold. We will return to this issue empirically in section 5. However, it is worth noting that many neural models include some form of threshold for signal propagation. We have not needed to do this in our spreading activation as the sigmoid function basically ‘squashes’ low activation and makes some effectively zero. However, if anything, a threshold is more similar to the way our own brains work. The choice between threshold or cap is likely to be pragmatic. Certainly during our own experiments (described in section 5), we found that outputs were very stable for different threshold levels, and thus suggest that cap-based limits (effectively a variable threshold) are thus unlikely to behave differently.


19

4.2. Non-memory resident ontologies As noted, even information extracted from personal resources such as email archives may become too large to fit within main memory and clearly the web is too large! So we have to consider strategies for dealing with much larger ontologies. Measurements of the web [60] suggest that about 75% of pages are connected (linked to or from) a single strongly completely connected component (the SCC) comprising about ¼ of the web. For pages in or connected to this SCC, the average distance in terms of undirected links between pages is 7 links. While the web of data is not sufficiently developed to be able to predict the equivalent figure for it, it is reasonable to assume that it too will comprise a relatively small world. It may contain some disconnected components, but, if the linked data vision becomes reality, the majority of entities will be linked to the whole. We can therefore assume that WNr, the number of entities in the web of data at distance r or less from an initial activation entities, is likely to be very large if not comprising the majority of the web. For generic global calculations such as Google PageRank for web pages, it is acceptable to effectively reason over the whole web, but for more bespoke queries, and especially our own application area where we want fast per-user-interaction update of context, we need to be reasoning over just the relevant web. By its nature spreading activation tends to have a non-local effect, which is likely to interact poorly with non-local access, at worst touching every entity and triple in the ontology. Some means are therefore required to limit the impact and spread of activation in order to avoid this large scale flooding of the ontology. Happily, the number of high activation entities is substantially smaller than the total ontology and so limiting the number of activated entities, whether using threshold or cap, has the potential to help significantly as only the activated entities need to be brought into main memory. For context inference this is particularly appropriate as the active entities are also be expected to change only slowly over time. Furthermore, if we are using STA/MTA/LTA scheme, then it is likely that MTA as well as STA can be maintained entirely within main memory, with only LTA recorded on disk. However, LTA can be stored on local disk, even if the entities it refers to are distant (see also section 4.5 for loading rules for LTA). In fact things are slightly more complicated as we can only know the activation of an entity if it is in memory to participate in memory resident spreading activation. That is the choice of whether to bring in a new entity can only be made based on the entities and relations already in main memory. We therefore need some form of fetch rules to determine what to bring into memory and also discard rules to decide what and when to purge data to make room for new.

20


Figure 6. Proposed architecture

Figure 6 shows the main components needed for this. This shows the personal ontology and also various remote resources. In many ways the personal ontology and remote resources can be treated uniformly, but there are some differences as discussed in section 4.8. A disk cache is also shown for remote resources, but as this is a standard feature we do not consider it further. There is also a local persistent store for LTA. Note that this may include LTA for entities in remote stores as well as those in the personal ontology. In this figure and the rest of this section, relation instances are considered to be expressed as triples , denoting that entity e is connected to entity e’ via a relation r, in keeping with semantic web usage. We have not included a symbolic reasoning engine explicitly in Figure 6, as our focus is on the contextual reasoning of the spreading activation. However, in section 6.1 and 6.2 we discuss how symbolic reasoning might be integrated into this picture. Of course the remote resources may themselves have some level of reasoning support; in this case we effectively treat primary and inferred data uniformly. If the remote reasoning itself involves some level of fuzziness or uncertainty and this is passed on as provenance, this could be used to modify weights within the spreading activation, but we will not consider this in detail here. 4.3. Entities ‘in memory’ Actually the idea of whether an entity is ‘in memory’ is itself slightly problematic, while it is the entities that are activated in the spreading activation, in a pure ontology-based system an entity is no more than its identity and the triples/relation-instances involving the entity. Strictly the question is what triples are in memory, what entities are


21

mentioned in some triple in memory, and what proportion of the triples mentioning an entity are in memory If we let To be the set of all triples in the ontology (both the personal ontology and non-local resources including the whole web) and Tm are the triples currently in memory, we have: Tm ⊆ To ⊆ E x R x E

Where E is the set of all entity labels and R the set of all relations. We can then define: Eo = entities ( To ) – the entities present in the full ontology Em = entities ( Tm ) – the entities mentioned in the triples in memory

where: subjects(T) = { e | ∃ ∈ T } objects(T) = { e | ∃ ∈ T } entities(T) = subjects(T) ∪ objects (T)

For an entity e mentioned in main memory, the triples in Tm referencing e may be a more or less complete subset of the triples in To referencing e. At one extreme Tm may contain only one of these triples from To, while at the other it may contain all the triples from To that include the entity. If the latter is true we can say the entity e is complete in the ontology: complete(e) = ∀ ∈ To : e1=e ∨ e2=e ⇒ ∈ Tm

More generally we might be interested in a particular subset of entities E’ and relations R’ and whether a particular entity e that is mentioned has all triples relating it to entitles in E’ through relations in R’ complete_wrt(Eʼ,Rʼ)(e) = ∀ ∈ To : r∈ Rʼ ∧ ( (e1=e ∧ e2∈ Eʼ) ∨ (e1 ∈ Eʼ ∧ e2=e) ) ⇒ ∈ Tm

As shorthand we shall use complete_wrt(Eʼ) to mean complete_wrt(Eʼ,R) where R is the set if all possible relations. One kind of ‘being in memory’ for an entity is to ask that it is complete in the sense above of having every related triple. This is equivalent to performing an RDF DESCRIBE Query from the disk triple store [61]. Alternatively we may simply include all the links between it and things in memory (that is complete_wrt(Em)).

22


4.4. Fetch rules: choosing which triples to load into memory We can follow the general principle of bringing in data related to highly activated entities, but there are a number of variations, several or all of which can be applied. Filling-in rule – If an entity’s activation exceeds a critical threshold tfill, we retrieve all triples linking that entity to others in main memory - this is we make the entity complete_wrt(Em). Ripple-out rule – If an entity’s activation exceeds a threshold tripple, we retrieve all triples that include it. If we apply both of these fetch rules then we need tripple > tfill. These rules are both focused on adding information about the entity under scrutiny. The ripple-out rule is potentially problematic if the entity is highly connected; for example, if it were a country, say ‘Greece’ then we would end up drawing in every person whose country of birth is Greece, as well as the city Athens as the capital of Greece, Greek as its language, etc. A more conservative rule than ripple out would be to preferentially include triples with smaller fan-out (capital, language rather than ‘place of birth of’). This is similar to what we do for spreading activation itself, and so we can think of rules that are a form of ‘look ahead; retrieving triples that would get a certain level of activation ‘if they were there’. Look-ahead rule – For any entity e, take all relations r that may have e as subject or object (based on typing), but for which we do not yet have all instances in main memory. Assume we know the fan-out fr;e for relation r from entity e (either use average for the relation, or more specific count if it is known). Use fr;e and the current activation of e to calculate what activation a would be spread to entities connected via relation r if they were present in memory. If a exceeds some threshold tlook, we retrieve all triples or