Entanglement in multiplex networks ... - Benjamin Renoust

2 downloads 0 Views 2MB Size Report
Jackson, Leonardo DiCaprio, Meryl Streep, etc. Clearly, there is a generation gap be- tween those two communities of actors with Robert Duval filling the gap ...
Entanglement in multiplex networks: understanding group cohesion in homophily networks Benjamin Renoust12 and Guy Melanc¸on1 and Marie-Luce Viaud2 1

CNRS UMR 5800 LaBRI, INRIA Bordeaux Sud-Ouest Campus Universit´e Bordeaux I Talence, France {Benjamin.Renoust, Guy.Melancon}@labri.fr 2 Institut National de l’Audiovisuel (INA) Paris, France [email protected]

Abstract. The analysis and exploration of a social network depends on the type of relations at play. Homophily (similarity) relationships form an important category of relations linking entities whenever they exhibit similar behaviors. Examples of homophily networks examined in this paper are: co-authorship, where homophily between two persons follows from having co-published a paper on a given topic; movie actors having played under the supervision of the same movie director; members of a entrepreneur network having exchanged ideas through discussion threads. Homophily is often embodied through a bipartite network where entities (authors, movie directors, members) connect through attributes (papers, actors, discussion threads). A common strategy is then to project this bipartite graph onto a singletype network. The resulting single-type network can then be studied using standard techniques such as community detection or by computing various centrality indices. We revisit this type of approach and introduce a homogeneity measure inspired form past work by Burt and Schøtt. Instead of considering a projection in a bipartite network, we consider a multiplex network which preserves both entities and attributes as our core object of study. The homogeneity of a subgroup depends on how intensely and how equally interactions occur between layers of edges giving rise to the subgroup. The measure thus differentiates between subgroups of entities exhibiting similar topologies depending on the interaction patterns of the underlying layers. The method is first validated using two widely used datasets. A first example looks at authors of the IEEE InfoVis Conference (InfoVis 2007 Contest). A second example looks at homophily relations between movie actors that have played under the direction of a same director (IMDB). A third example shows the capability of the methodology to deal with weighted homophily networks, pointing at subtleties revealed from the analysis of weights associated with interactions between attributes.

1

Introduction

The analysis and exploration of a social network depends on the type of relations at play. Borgatti [7] had proposed a type taxonomy organizing relations in four possible

categories, among which homophily (also referred to as similarity) links actors exhibiting similar attributes such as membership in a club or interest group [28]. These types of ties do not represent actual social ties themselves, but might lead to a higher probability of a tie to develop between the members sharing similar attributes. Examples are networks of co-author, where homophily between two persons follows from co-authorship; networks of movie actors having played under the supervision of the same director; or networks of members having exchanged ideas through discussion threads, for instance. The second type of ties are social relationships that can be affective relationships such as friendship) usually spanning over time. The third type captures joint interactions observed through discrete events such as calling each other or travelling together. The last type of ties describes flow (tangible or intangible) between entities (migrants moving between places, air traffic passengers between airports, etc.). This paper focuses on networks induced from homophily relations. Homophily is often embodied through a bipartite network where entities (authors; movie actors; members) connect through attributes (papers; directors; discussion threads). Guillaume and Latapy [17] advocate bipartite graphs as being universal models for complex networks, hence offering additional motivations to use of these graphs to describe homophily relations. Indeed attributes of different natures can be also seen as another type of entities interacting together across the edges of the homophily network. When dealing with bipartite graphs, a common strategy is to project them onto a single-type network with entities of a same type. Edges are sometimes weighted based on how much entities interact through attributes. The resulting single-type network often tends to have high edge density, with a propensity to contain cliques (depending on the affiliation data used to build the bipartite graph) [17]. It may nevertheless be studied using standard techniques such as community detection using edge density, or the computation of various centrality indices. Such study of the bipartite projection can however hinder subtle characteristics of the original data since it can create relationships that do not exist (Fig. 1), hence inducing many cliques that may not be relevant. Many different attributes can also generate such cliques as illustrated in Fig. 3. One option from [29] is the computation of a onemode projection from the most significant edges, but it still presents a loss of information. Our methodology proposes to directly study the multiplex networks – as defined in [10,24] – and remains compatible with data modeled with a bipartite network. In this model, the different edge attributes refer to different edges across different layers of the multiplex network. Referring to the work of Manski [27], we take the notion of a group as a central paradigm guiding the analysis of homophily networks. Numerous authors have indeed confronted homophily to many social behaviors or phenomenon (influence, contagion, information diffusion, e.g.) [1, 3, 37] questioning Manski’s group effect as the driving force explaining the observed phenomenon. Taking inspiration from past work by Burt & Schøtt [9], this paper introduces a novel use of a node index along with two multiplex network measures, supporting the interactive inspection of a group in a homophily social network as a means to question the drivers of its internal cohesion.The key idea we exploit is to look at attributes and investigate how they interact. That is, although the focus of the analysis is on entities,





Fig. 1. A side effect of the bipartite projection: we start from a multiplex network (on the left) associating entities (nodes) through different attributes (edges in color), then convert it into a bipartite network (middle) with the right (round shape) entities corresponding to adjacent edges in the first network, and finally project the bipartite network onto another network (right). We can observe the apparition of new edges. Note that the right multiplex network could be considered as an entity-similarity multiplex network.

cohesion of a group is measured through interactions taking place between attributes involved between actors in the group of entities. From now on, entities will refer to elements of a, a0 ∈ A while attributes will refer to elements b, b0 ∈ B. When considering a network of co-authors linked through keywords (indexing papers), entities will correspond to authors while keywords will be seen as attributes (see subsection 3.3). When considering a network where movie directors are linked through movie actors they have directed, entities will map to movie directors while movie actors will be considered as attributes (see subsection 3.1). The notion of group here is rather abstract and can be either user-defined or computed using a variety of methods, from data clustering to community detection using modularity criterion, for instance. Although advances have been made on that front in the past decades [15, 21] no algorithm or solution imposes itself as being superior in all situations. Questioning this notion of group can help in understanding and validating the output of algorithms, which is a challenging analysis task. Our paper contributes an approach designed to help users evaluate the reliability of a proposed group structure. Because similarity between entities is most often measured based on co-occurrences of attributes, we provide a means to simultaneously work on two networks derived from the original homophily multiplex network or bipartite graph: one directly linking entities, and the other directly linking attributes. The notion of a group we consider here depends on the context: it may be a cluster computed from any algorithm, a subset of entities selected by a user, or the result of a query on a network, for instance. This paper extends the previous ASONAM publication [34] and our work contributes with one node index and two multiplex network measures computed on any group of entities indicating the overall cohesion of the group measured through the intensity and homogeneity of interactions of their co-occurring attributes(that is the entanglement of a multiplex network). We extend this approach to weighted interactions. Exploring the network, selecting a group or subset of co-occurring attributes and getting

feedback on internal entanglement, analysts can validate the model implicitly supported by the grouping procedure. Our method has been validated based on three different datasets, among which the first two are widely known and used. A first example looks at authors of the IEEE InfoVis Conference (InfoVis 2004 Contest) [19]. A second example looks at homophily relations between movie actors that have played under the direction of a same director (IMDB) [40]. Our third example examines the Edgeryders community forum [39] where homophily emerges from discussion threads. This last example shows the capability of our methodology to deal with weighted homophily networks, pointing at subtleties revealed from the analysis of weights associated with interactions between attributes. Related work. Bipartite graphs form an important modeling tool in social network analysis, supporting two-mode concepts [5]. They form an important analytical artifact to study homophily relations [13], and were even claimed as universal models for complex networks [17]. The literature covers a wide variety of approaches dealing with different properties of bipartite graphs and homophily networks. An optional but common strategy consists in projecting the graph inducing relationships between entities of a same type (see [6, 20, 30, 33, 36, 42], for instance), with the obvious disadvantage of containing lots of cliques, the relevancy of which can be questioned [14]. Neal [29] recently introduced an approach computing a one-mode projection the most significant edges based on local likelihood. Latapy et al. [25] offer to study in a bipartite network the neighborhood overlaps of a node so that the network would stay connected even without it. Fujimoto et al. [16] studied network autocorrelations in bipartite network as a way to measure the influence of nodes of one mode into the formation of edges in the opposite mode. Other research also focuses on finding bicliques (such as in [32] and [5]) which can be suspected to form cohesive subgroups. Only little work has been yet propose for the study of multiplex networks, and we can mention the efforts from [10, 24] for bringing a mathematical formulation of multiplex networks with tensors, although this effort is not focused on direct use an applications of multiplex networks. Because of their wide applicability and because they also offer a straightforward graphical representation of the data, bipartite graphs have been recently used in the design of a website traffic analysis system [11]. Finally, Kaski et al. [22] studied homophily in gene networks (similarity in gene expressions) in bio-informatics with emphasis on the trustworthiness of similarities, which places it close in spirit to our work.

2

A new look at homophily networks: introducing the entanglement index in multiplex networks

This section takes a closer look at homophily networks and describes the general framework we use. As we shall see, cohesion of a group is easier to achieve with smaller groups. Inspecting a group, in an effort to understand why and how cohesion is embodied in the group certainly requires to be validated based on user knowledge. This only makes sense when conducted on small scale groups, gathering hundreds of nodes at most.

Simple questions come to mind when inspecting a group, such as “How can we assess a group really forms a cluster?” “How can we make sure all entities of a cluster really belong to it?” “Should we suspect the group to contain marginal (outlier) entities?”, “What are the attributes that tie the entities together?” etc. A central ingredient we used to answer these questions is a set of metrics that capture the homogeneity and intensity of interactions between attributes associated with entities. These metrics can be viewed as an aid to assess of the internal cohesion of a group. 2.1

Interaction networks

Our starting point is a set of entities a, a0 , a00 . . . (type A ) with associated attributes b, b0 , b00 . . . (type B). Fig. 2 (a) provides an example where entities are authors (of papers) and attributes are keywords (indexing papers). This is a typical situation where a homophily relationship can be inferred for example between authors (having published a paper). We may build a bipartite network where entities (authors) a, a0 ∈ A necessarily connect to attributes (keywords) b, b0 ∈ B while there are no direct links between entities nor attributes (Fig. 2 (b)). Denote the bipartite entity-attribute network as G = (A ∪ B, E) with edges a − b whenever entity a is associated with attribute b (see Fig. 2 (b)). Referring to Opsahl [31], there is often a primary node set and a secondary node set in bipartite (or two-mode) networks. For Opsahl [31], the primary node set is responsible for tie creation, that is the secondary node set is characterizing these ties. In a multiplex network, this secondary node set represents the different layers of interactions. Hence, two other networks are derived from this entity-attribute network, namely an entity interaction network GA and an attribute interaction network GB . The entity network is usually built from the entity-attribute network by projecting paths a−b−a0 (linking entities a, a0 ∈ A through attribute b ∈ B) onto an edge a − a0 directly linking entities. We also need to store the attribute b as a label for the edge a − a0 . Edges in GA are thus labelled by subsets of attributes (all attributes b, b0 , . . . collected from triples a − b − a0 , a − b0 − a0 , . . . ). Because we are focusing on entities group cohesion and on attribute co-occurrence, we filter out some of the edges. Loops are discarded to obtain the entity interaction network GA = (A , EA ).The resulting network is shown in Fig. 2 (c). Note that, in the case of a multiplex network such as an author co-publication network, the entity interaction network is defined by the multiple relationships across authors. Going through the bipartite model would imply direct relationships across authors that are not expected as detailed in Figure 1. The construction of the entity interaction network remains the same. Links in the attribute interaction network GB = (B, EB ) are built from attributes b that co-occur at least once with another attribute b0 (through at least two entities). That is, there must exist at least two paths a − b − a0 and a − b0 − a0 to infer the edge b − b0 in EB . Note that this network is not obtained by projecting paths b − a − b0 onto edges b − b0 . For instance, EB does not contain edges connecting attributes that only concern a single entity. The resulting network is shown in Fig. 2 (d). The attribute interaction network is a central artifact in studying group cohesion.

(a) Authors A writing papers with keywords B

(b) Bipartite graph G = (A ∪ B, E)

(c) Author interaction network GA = (A , EA )

(d) Keywords interaction network GB = (B, EB ) Fig. 2. The initial data in this example is formed of authors with associated keywords A, B,C, . . . (a) (e.g. keywords indexing papers). This situation is modeled as a bipartite network linking authors to keywords (b) (authors having published papers with given keyword, see section 3.3). We then consider the projected author interaction network with keywords as multiple edges (c) from which we derive a keyword interaction network (d).

Fig. 3 underlines the “nuance” we wish to bring into the analysis of homophily networks. Consider entities (depicted here as pale blue squares) with attributes A, . . . , E; entities are linked by an edge whenever they share an attribute. Observe that in both situations the pairwise “distance” between entities is the same (any two entites share either one or two attributes) ending in identical topologies of the attribute network GA . As a consequence, based on pairwise distance, these two groups are somehow equivalent.

(a) “Centralized” interactions

(b) “Cyclic” interactions

Fig. 3. An example underlining the “nuance” we emphasize by looking at how attributes A, . . . , E interact. In both figures, the square node graph (left) link type A entities (authors, movie directors, e.g.) whenever they are linked to a same entity of type B (keywords, movie actors, e.g.). Entities of type B appear as labels on induced links. The round node graph (right) describes how type B entities interact, that is when they co-occur as labels on an edge. The type B interaction network clearly distinguishes the two situations, whereas the projected single-type A networks show identical topologies.

Now, consider the attribute networks (with circle nodes) derived from these two situations. In the first situation (Fig. 3(a)), all entities having attribute A gives this attribute a central position – if there were a reason explaining why these peole form a group, it would certainly rely on the group gathering around A, the other attributes being somehow accessory. The second situation (Fig. 3(b)) is much more balanced (although attributes do not mix as intensely as they could). This small example points at situations where the analysis may be mislead when solely inspecting the single-type people network. The attribute interaction network actually is key to understanding how attributes interact within a group. As these simple examples show, the inspection of a group of entities with associated attributes raises several questions. It might be important to know whether attributes equally map to all entities in the group, for instance. Conversely, a misleading transitivity effect may be suspected to take place. Indeed, we may have attributes b, b0 cooccurring between entities a and a0 , and attributes b0 , b00 co-occurring between entities a0 and a00 , may lead one to believe that b, b0 , b00 simultaneously co-occur between all three a, a0 , a00 . Although the case can be easily spotted when only considering a few entities and attributes, the transitivity effect becomes rapidly confusing as we increase the number of entities and attributes. We address this issue by looking at how well attributes mix within a group. This is accomplished using the entanglement index introduced in the forthcoming sections.

This index is computed for each attribute (or layer) b, measuring how homogeneously and intensely an attribute co-occurs with all other attributes in a group of entities. As we shall see, global entanglement homogeneity and intensity at the group level can then be computed from the individual attribute entanglement indices. The definition of the entanglement index makes it so that optimal homogeneity is reached whenever attributes have the same entanglement index, that is when all entities have the exact same associated attributes, and that all attributes equally co-occur within entities; and the optimal intensity is reached whenever all entities share exactly all attributes. 2.2

Attribute interaction matrices and the entanglement index

Edges b − b0 ∈ EB moreover carry weights nb,b0 indicating how often attributes cooccur between entities in the considered group. We also define nb,b to count the number of edges in EA carrying the attribute b. The matrix NB collecting all these nb,b0 entries gives rise to another matrix CB filled with ratios cb,b0 = nb,b0 /nb0 ,b0 . The value cb,b0 may be viewed as computing the (conditional) frequency that an edge be of type b given it is of type b0 . We give cb,b another definition, namely cb the proportion of edges carrying attribute b among all N edges in GA = (A , EA ) such as cb = nb,b /N. Consider the example in Fig. 2. Starting from authors a ∈ A having published papers with keywords b ∈ B (attributes), we build a bipartite graph where authors a, a0 link through keywords b whenever a and a0 have co-authored a paper with keyword b (Fig. 2 (b)). A single-type graph is obtained by inducing edges between authors labeled with keywords (Fig. 2 (c)). The resulting keyword interaction network is shown in Fig. 2 (d). The matrices NB and CB (built over keywords C, D, E and L) then read:  3 3  NB =  1 0

3 3 1 0

1 1 3 1

 0 0  1 1

 0.75 1.00  CB =  0.33 0.00

1.00 0.75 0.33 0.00

1.00 0.33 0.75 0.33

 0.00 0.00  1.00 0.25

We now wish to compute the entanglement index for each attribute, measuring how much a attribute b contributes to the overall cohesion of an entity group. This notion of cohesion is inspired from Burt & Schøtt’s work on relation content in multiple networks [9]. Denote by λ the maximum value among entanglement indices λb of attributes b ∈ B. In other words, the entanglement index of attribute b is a fraction of λ , namely λb = γb ·λ with γb ∈ [0, 1]. The entanglement value of an attribute b is reinforced through interactions with other highly entangled attributes. Having a probabilistic interpretation of the matrix entries cb,b0 in mind, we can thus postulate the following equation which defines the values γb . γb0 · λ =

∑ cb,b0 γb

(1)

b∈B

The vector γ = (γb )b∈B collecting values for all attributes b, thus forms a right 0 , as Eq. (1) gives rise to the matrix equation eigenvector of the transposed matrix CB

0 · γ. The maximum entanglement index thus equals the maximum eigenvalue γ · λ = CB 0 . λ of matrix CB The actual entanglement index values λb are of lesser interest; we are actually interested in the relative γb values. Furthermore, we shall see how the entanglement vector γ and eigenvalue λ can be translated into network measures to help understand entanglement in a group of entities. Hence the entanglement indices for our example’s attributes are:   γ = 0.63, 0.63, 0.43, 0.12

Notice that two indices are equal, and correspond to keywords C and E 2.3

Homogeneity and intensity

This section introduces entanglement intensity I and entanglement homogeneity I as global network measures. The topology of the attribute interaction network GB = (B, EB ) provides useful information about how attributes contribute to the overall cohesion among entities of a group. The focus here is on interactions among attributes, and aims to reveal how cohesive the group of entities is, considering this set of attributes. The archetype of an optimally cohesive entity group is when all entities have the exact same associated attributes. In that case, the graph GB = (B, EB ) then corresponds to a clique. As a consequence, all matrix entries nb,b0 coincide, so all entries in matrix 0 then equals λ = |B|, and all γ coincide. CB equal 1. The maximum eigenvalue of CB b That is, all attributes indeed contribute, and they all contribute equally to the overall entity group cohesion. The Perron-Frobenius theory of nonnegative matrices [12, Chap. 2] further shows that λ = |B| is the maximum possible value for an eigenvalue of a nonnegative matrix with entries in [0, 1]. The Perron-Frobenius holds for irreducible matrices, that is when the graph GB is connected. Hence, the connected components in GB = (B, EB ) must be inspected independently. When the matrix CB is irreducible, the theory of non-negative matrices tells us that it has a maximal real positive eigenvalue λ ∈ R, and that the corresponding eigenvector γ has non-negative real entries [12, Theorem 2.6]. We hereafter assume GB is connected si that CB is irreducible. Inspired from the clique archetype of an optimally cohesive entity group, we wish to measure the entanglement the entity group level. We already know that the eigenλ value is bounded above by |B|, so the ratio I = |B| ∈ [0, 1] measures how intensely interactions take place within the entity group. This ratio thus provides a measure for entanglement intensity I among all entities with respect to attributes in B. From our previous example I = 0.31 denoting a low interaction across catalysts. We also know that the clique situation with equal cb,b0 matrix entries leads to an eigenvector γ with identical entries. This eigenvector thus spans the diagonal space generated by the diagonal vector 1B = (1, 1, . . . , 1). This motivates the definition of a second measure providing information about how homogeneously entanglement dis,γi ∈ tributes among attributes. We may indeed compute the cosine similarity H = ||1h1B||·||γ|| B [0, 1] to get an idea of how close the entity group is to being optimally cohesive. We will refer to this value as entanglement homogeneity H . From our previous example

H = 0.91 denoting a relatively homogeneous but not optimal distribution of entanglement indices. A thorough study of the entanglement indices, and the homogeneity and intensity network indices is out of the scope of this paper (see [35]). Other measures, including Shannon entropy [38] and Guimera’s participation coefficient [18], offer interesting alternatives to cosine similarity. 2.4

Weighted interactions

In real-world networks, relationships across entities may not always be considered as equal, and we often need to utilize weights associated with edges. These weights might model the intensity of interactions between members of a group, or intensity of a flow between two entities, for example. We now wish to consider a weighted entity interaction network GA = (VA , EA ). That is, GA is equipped with edge weights w : EA → R+ (where R+ denotes the set of reals r ≥ 0), hence denoting the weight of an edge e as we . We extend the map w to sets and write w(F) = ∑e∈F we for any subset F ⊂ EA . Let us also consider a map τ : EA → 2B where τ(e) ⊂ C is the set of all the different attributes b ∈ B that are associated with edge e ∈ EA . Whenever b ∈ τ(e), it means that the edge e bears attribute b. Conversely, τ −1 (b) ⊂ EA is the set of edges bearing attribute b, so whenever e ∈ τ −1 (b), it means that the edge e bears attribute b. The quantities nb,b0 and cb,b0 may be generalized to a weighted entity interaction network by setting: nb,b = w(τ −1 (b)) =



we

(2)

e∈τ −1 (b)

 nb,b0 = w τ −1 (b) ∩ τ −1 (b0 ) nb0 ,b cb,b0 = nb0 ,b0

(3) (4)

That is, nb,b equals the sum of weights of edges e ∈ EA bearing attribute b ∈ B and nb,b0 equals the sum of weights of edges bearing both attributes b and b0 . Because we need to preserve the probabilistic interpretation of cb and cb,b0 values, we further set: cb =

nb,b w(E)

(5)

As a consequence, Eq. (5) may be interpreted as the probability that an edge bears attribute b and Equation (4) may be interpreted as the conditional probability that an edge carries b knowing that it already bears b0 . Observe that considering equal weights we = 1 for all edges e ∈ E coincides with the non weighted version introduced in the previous section. Using the newly defined quantities cb,b0 , we may still define the entanglement index through matrix equation (Eq. 1). Note that, unless we filter out edges using a threshold on weights, the shape of the attribute interaction network remains the same in both situations, weighted and nonweighted.

3

Case studies

The case studies we describe in this section aim at showing how the entanglement indices, and the homogeneity and intensity indices of networks help users explore social networks and reason about the homophily content. Navigating the network and getting feedback about these indices, users can question the structure of the space that binds entities together. The examples are designed to highlight different aspects of the exploration, each time underlining how the indices contribute to better understand the group structure of the homophily network. As the examples will show, the entanglement methodology was embedded in a visual analytics environment providing sound interactions to help users flexibly select subgroups. While users get immediate visual feedback about the entanglement values at play, the environment also allows them to explore the networks, enquire about homogeneity by easily hopping between the entity and attribute networks. Roughly speaking, the knowledge users gain after applying a grouping procedure (clustering, community detection) is that “a group of entities” share “a list of attributes”. This is where the entanglement index enters the scene. What does “a list of attributes” really mean? Do all entities share all attributes? Do entities more or less split between attributes? What particular attribute(s) make(s) the split explicit? In other words, users must be able to elucidate to what extent, and possibly how/why, the group of entities form a more or less cohesive unit. Our first use case focuses on the IMBD network [40] gathering movie directors linked through movie actors (they have directed). Our second use case focuses on an author/keyword network extracted from the InfoVis 2004 Contest [23]. Our third use case introduce a user/topic network from a study of the Edgeryders community [39]. All use cases illustrate how the entanglement index, and network homogeneity and intensity can be used in a visual social network analytics context. 3.1

IMDB

This first use case is built from the Internet Movie DataBase, a largely used dataset [40]. Auber et al. [2] had visualized a small world subset of the IMDB co-acting graph. Starting from a small set of “star” movie actors, we have extracted the corresponding movie directors to form a bipartite network where movie directors connect to movie actors they have directed. Applying our methodology we compute (i) a movie director network (entities), where two directors connect when the set of movie actors they have directed (attributes) share at least two actors, together with (ii) the corresponding movie actor interaction network. The data may thus be used to find cohesive subgroups of movie directors, those whose artistic signature rely on similar movie casts. This first example gathers 15 actors and 16 directors (see Fig. 4). A low intensity and medium homogeneity, together with a loosely connected actor interaction network topology suggest that actors and directors roughly split into two communities. The director network has medium homogeneity that corresponds to a quite balanced distribution of actors among them. Homogeneity is not optimal: the directors did not individually direct each of these actors although, as a group, they did direct all of these actors. The low values of the network level measures readily indicate the need to dig further

Fig. 4. IMDB - directors appear on top; the actors interaction network is displayed at the bottom. Selecting a group of directors highlights the corresponding actors, with node size mapped to their entanglement index. This group of directors shows low homogeneity and intensity. We can clearly see that the distribution of actors is unbalanced, partly because Sharon Stone plays by far a central role in the interactions between directors – the directors all have, at some point, directed her.

into the network and try to “nuance” the cohesion of this group. Roughly speaking, low intensity follows from the fact that most directors have directed only a small number of actors relatively to the whole set. As can be seen from Fig. 4 (bottom), the two communities of actors are connected through Robert Duvall, and the two communities of directors are connected through Sidney Lurnet. Apart from Robert Duvall, the bottom right community of actors is formed around Marlon Brando, Al Pacino, Jeremy Irons, Jack Nicholson, etc. The top left community of actors is formed around Sharon Stone, Harvey Keitel, Samuel Lee Jackson, Leonardo DiCaprio, Meryl Streep, etc. Clearly, there is a generation gap between those two communities of actors with Robert Duval filling the gap – just as Sidney Lurnet does it in the director network. The community of actors located in the top left part of the panel correspond to a different group of directors (connecting to the previous group through Sidney Lurnet). It gathers Spike Lee, Jim Jarmusch, Martin Scorsese, Woody Allen and others. This community has similar intensity but higher homogeneity when compared to the overall network. This means these actors have equal influence within this group and better capture altogether the artistic signature of these directors as a group. The upper left subgroup in the director network (see Fig. 5) actually divides into three overlapping cliques. Two cliques reach maximal homogeneity and intensity (the exact same actors have all played under their direction). The third clique (Bruce Beresford, Jim Jarmusch, Barry Levinson, and Sidney Lurnet) – selected in the top panel of Fig. 5 – focuses on Ellen Barklin and Sharon Stone. It has lower homogeneity and intensity indices: they don’t mix that well with the other actors. This use case thus underlines the fact that although a group involves a well identified and distinct set of attributes (movie actors), the cohesion of the considered group may rely only on a subset of these attributes. Additionally, group cohesion must not solely rely on the topology of the projected single-type network obtained from the original bipartite network. 3.2

Hopping between the entity and attribute networks

The previous example readily show how the attributes’ entanglement indices, and the homogeneity and intensity measures may be used to inspect homophily networks and assess cohesion in subgroups of entities. The synchronized dual view we use combines two distinct but complementary networks: the networks of entities GA and the interaction network of attributes GB . Finding the correspondance from a set of entities selected from GA to attributes in B is straightforward, as it suffices to select the desired subset of entities: we then recompute a new matrix CB based on the induced subgraph of GA . Observe however that the synchronization is asymmetric. Indeed, retrieving entities of type A from a set of attributes in B is a different matter. Two distinct questions may be asked when querying a subset of attributes in B ⊂ B: – Which entities a ∈ A bear at least one attribute b ∈ B? – Which entities a ∈ A bear all attributes b ∈ B?

Fig. 5. A group of directors (top) and the corresponding actors they co-directed (bottom, highlighted) with node size mapped to their entanglement index. This clique of 4 directors shows higher homogeneity and intensity than the selected group on Fig. 4.

Moreover, what relationships take place between the retrieved entities? Interestingly, these questions are placed in Lee et al.’s taxonomy [26] half-way between topologybased tasks on adjacency, and attribute-based tasks on links. The second question often helps to narrow down results from the first question. Given these questions it is then straight forward to propose the two corresponding boolean operators: S OR : VB → VA with B 7→ OR(B) = b∈B τ −1 (b) ⊂ A , T AND : VB → VA with B 7→ AND(B) = b∈B τ −1 (b)A , where B ⊂ B. Observe that the induced subgraph in GA is not necessarily connected. Typically, when using a node-link view of these networks, the selection of a set of entities should automatically trigger the selection of the relevant attributes and compute the corresponding entanglement, homogeneity and intensity values. This is illustrated in Fig. 5), where a set of movie directors has been selected (top panel). Movie actors that played under their direction, here seen as attributes of movie directors, are highlighted (right panel). The corresponding homogeneity and intensity, restricted to these four selected directors, are displayed as a background of the selection lasso, while the actual values are reported in a side panel. The size of movie actors nodes corresponds to their entanglement index: a larger node indicates a movie actor weighs more in bringing these movie directors together as a group. Quite naturally, results of a query in one network can be used to feed a new query. Typically, after the application of the AND operator to identify a subset of entites in A sharing all the selected attributes, the query is expanded to see what other attributes are at play. The forthcoming use cases provide examples (see Fig. 12, for instance). As a matter of fact, the proposed mode of interactions falls into Yi’s taxonomy “Selection” tasks [41]. Incidentally, their flexibility supports Buja’s “Posing queries” [8] task. Obviously, the proposed environment supports “Making comparisons”, a central task in all data analysis task taxonomy. 3.3

InfoVis 2004 contest

Our second example concerns data of a different nature, where keywords (attributes) link to authors (entities), showing that the notion of entanglement can actually apply to a wide variety of application domain. We selected a subset of the InfoVis 2004 Contest dataset gathering papers published at the IEEE InfoVis symposium over the period 1994–2004 [23]. The data we consider are authors indexed by keywords gathered from papers they published. We thus compute a bipartite graph where authors link to keywords. To some extent, with respect to Borgatti’s taxonomy of relations [7], this network could be considered as an interaction network since co-authorship indeed involves direct contact with collaborators. When we consider authors and keywords, groups may form because authors are socially very close – working at the same institution or having graduated from the same university – or just formed an opportunistic association around trendy topics. That is, co-publication is after all a social activity. We took this aspect in consideration by making sure that authors were connected through a keyword only when they indeed had co-published a paper on that topic – not just because they both had published a paper on that topic.

Fig. 6. The InfoVis 2004 Contest data gives rise to a keyword interaction network (bottom) coupled with an author social network (top). The three selected authors hold a central position in the social network (top). Their co-publications cover a wide spectrum of topics as shows the clique of keywords in the bottom image. Entanglement measures, although good, are however not optimal: they did not pairwise co-published on all these topics. We may indeed suspect each of them to have distinct co-authors in the network.

We show how our approach helps to solve two tasks of the InfoVis 2004 Contest: – Where does a particular author/researcher fit within the research areas ? – What, if any, are the relationships between two or more or all researchers? The author-keyword bipartite graph gives rise to a keyword interaction network GB and an author social network GA . Note that co-authorship relationships make of this network a natural multiplex network and authors that share the same keywords can be disconnected. The full social network GA contains about 1000 authors and breaks into several connected components. We will focus on the component lead by Woodruff, Olston and Stonebraker (see [23, leftmost part of Fig. 4]) gathering 16 authors (see Fig. 6 – top). The answer to the first question is straightforward. Selecting a single author, its associated keywords are pushed to the foreground in the keyword network, while positioned in the context of neighbor topics. The social network displays the co-authors of any selected author. The whole network can be similarly inspected author by author. Although this is useful because it provides fine-grained information on the network, it is lengthy and tiresome and cannot reasonably be performed on larger networks. This brings us to the second task requiring a more elaborated exploration strategy. In our case, we may take benefit of the apparent community structure of the social network. Conversely, we may select a subset of keywords and look at authors who have published on these topics to see how homogeneous a community they form, for instance. The topology of the author network (Fig. 6 – top) clearly shows three authors as central actors (A. Woodruff, M. Stonebraker and A. Aiken) at the intersection of two different cliques. Their associated keywords form a large clique covering a large part of the keyword network (Fig. 6 – bottom). The entanglement indices (node sizes) widely vary among keywords explaining why homogeneity is low, moreover suggesting that each of these three authors have her/his own set of topics. Selecting the authors that are part of the top clique in the social network (Paxson, Wisnovsky, . . . ), except those central actors leaves us with a subset of authors with optimal intensity and homogeneity: they all co-published on the exact same topics. The same is true if we select the authors that are part of the bottom clique (except the central authors – Olston, Spalding, . . . ). We may also select two marginal authors sitting on the left side of the social network (Baldonaldo & Kuchinsky) and observe that they link to keywords located out of the “Woodruff clique” keyword subsets. Strikingly enough, none of these sub-communities seem to address the topics portals and data visualization located at the bottom left of the keyword network. Grasping these two keywords, we find that they solely concern Woodruff and Olston. Leapfrogging the selection to Woodruff and Olston, we then see the additional topics these two authors have in common. Observe that, logically, these topics are marginally positioned with respect to the main clique (Fig. 7 – top). This second use case pointed at fully cohesive subgroups where authors have copublished papers on the exact same topics. This also suggest that the analysis may be conducted either from the actor (author) network or the attribute (keywords) network. Going back and forth between these two perspectives seems a fruitful strategy to get the most out of the entanglement index and the dual GA – GB representation.

Fig. 7. Browsing around “obvious” sub-communities of authors, the keywords portals and data visualization never pop up. Directly selecting them in the keyword network brings two co-authors up front: Woodruff and Olston (top). Selecting these authors shows their common topics of interest to be marginally positioned with respect to the main clique (bottom).

3.4

Comparative results from the InfoVis 2004 Contest

A full comparison with the results of the InfoVis 2004 Contest would require an extended study of the whole the dataset. Many of the presented results emphasized trends over the 10 year period observed, which is why here we only focused on a smaller excerpt from the results of [23]. In our use case, instead of presenting quantitative results over the different authors, we have presented specificities across authors relationships. We also applied on the excerpt the widely used Louvain clustering algorithm [4] returning three communities (see Fig. 9). The first community regroups Kuchinsky, Landay, Wang Baldonado and Woodruff, which presents clearly two disconnected components in the attribute interaction graph, suggesting two sub-communities within. The second community regroups Allen, Chen, Paxson, Su, Taylor and Wisnovsky, with I = 0.82 and H = 0.91 suggesting unbalanced collaborations as we discussed previously. The third community regroups Chu, Ercgovac, Lin, Olston, Spaldin and Stonebraker, with optimal values I = 1 and H = 1, confirming the cohesion of this community. Finally, even if Louvain has returned fairly cohesive communities, the entanglement analysis suggests to dig for more specific interactions, particularly in the case of disconnected components across attribute relationships. Comparing entanglement measures with known measures can be also challenging. Since they are computed for a multiplex network, they do not really correspond to either traditional network measures or bipartite networks. We will assume that we have the two separated entity interaction network and attribute interaction network. Hence, we can only compare entanglement intensity (I = 0.33) and homogeneity (H = 0.72) with “global” entity interaction network measures such as density (d = 0.48) and average clustering coefficient (cc = 0.91). A proper evaluation would compare those measures over a large number of different networks with varied characteristics. More interestingly, we can compare the entanglement indices with node measures on the attribute interaction network as in Figure 8, and confirm the differences among these statistics. Although the above results do not qualify as a full scale quantitative evaluation of the results of the entanglement analysis, they illustrate how the entanglement index, homogeneity, and intensity, stand out from traditional network measures. 3.5

Edgeryders

This last use case presents a situation with a relevant use of our weighted model, and brings also forward how we can take advantage of the AND and OR operators. We study here the Edgeryders community [39]. The data represents users participating to discussion threads on various topics. Each topic corresponds to a participation campaign lead by the Edgeryders’ leaders; campaigns took place one after the other. The topic 0–Undefined has been used for preliminary or out-of-scope discussions. During each campaign (topics 1 to 9), the Edgeryders leaders designed and implemented different policies to engage users in participating to the debate. Within the network, opinion leaders accordingly promote participation into the topics. Participation to a topic is weighted for each user in terms of effort measured as the length of a text (number of words) produced in one piece of conversation. A topic never closes, and users can participate to every topic by either starting a new thread or replying to an existing comment. The network is being used by the Edgeryders leaders to:

Fig. 8. Comparisons of the entanglement indices with traditional measures on the attribute interaction network, for a better comparison the different values have been normalized.Top left: betweenness centrality. Top right: degree. Bottom left: Page Rank. Bottom right: clustering coefficient. If no clear correlation can be observed on this excerpt, the measures clearly display many differences.

– evaluate the impact of their policy campaigns and especially see whether participation in given topics triggered interest in other topics; – evaluate the overall participation of members in exchanging ideas over the forum. The data, in its original forms, describes a multiplex network of users, on which each edge is one piece of conversation between two users concerning one specific topic. We have adapted this network to fit our model, where users u ∈ A are entities and topics t ∈ B are attributes. The data gathers 254 users exchanging ideas around 9 topics. Now, each user u produces an effort towards a topic t (measured as the total number of words written on that topic). We may thus consider weights on edges e = {u − u0 } by defining w(e) as the sum of the efforts of both users, u and u0 , on all topics. This weight, in a sense reflects the overall involment of users u, u0 towards each other. Obviously this should be taken into account when analyzing this social network. For a group to be cohesive, not only should users have exchanged ideas on the same topics but they should have put comparable efforts in participating to the debate. Note that, similarly to the InfoVis 2004 Contest example, we are looking at the homophily of an interaction network: two users are linked only if they have been discussing on a same topic and have been directly conversing together (which can be traced by looking at “replies”). Starting with the user network as shown in Fig. 10, we can see that opinion leaders are heavily dragging the edges (the 5 most connected nodes drag 26% of the edges, with the rest of the nodes averaging their degree to 3.2). Although showing a few local denser areas, the user network topology does not present any obvious community structures. A

Fig. 9. Top: three communities identified by the Louvain community detection algorithm. Bottom: the disconnected attribute interaction network corresponding to the community in orange (Kuchinsky, Landay, Wang Baldonado and Woodruff ), suggesting that two sub-communities correspond to this group.

Fig. 10. The user interaction network (left): node size on users is mapped to their degree; notice that a few nodes have very high degree (opinion leaders) while other nodes have very low degrees. The topic interaction network (right): the network forms a clique, meaning that all topics pairwise interact. The entanglement indices indicate however that topics 1, 2 and 4 concentrate most interactions while topics 0, 5, and 8 only marginally interact with other topics. .

deeper examination shows that those denser areas are composed of nodes mostly related to one or two “leader” nodes. The topic interaction network being a clique, all topics interact together at some point, suggesting to have a closer look at the entanglement values. The use of weights lead to a better interpretation of the network structure. For example, without weights we cannot distinguish the case in which two users are heavily contributing to two topics from the case in which they only lightly contribute. Using weighted edges, entanglement intensity and homogeneity are respectively equal to I = 0.14 and H = 0.94. Without weights, intensity shows as high as 0.40 (while homogeneity remains more or less the same), which actually ignores the heavy participation of some users on multiple subjects. Fig. 11 confirms that the entanglement indices between the weighted and non-weighted situations (and the ranking of topics according to these indices) are radically different. However, the overall distribution of indices remains close, and consequently does the homogeneity since it is a cosine measure. The inclusion of weights in the network leads to a more subtle interpretation of the entanglement measures as it includes the notion of how much effort has been mutually spent on different topics. Obviously, not considering weights in this network leads to an incorrect interpretation of the network activity. We can easily retrieve five leaders (the entity nodes of higher degree), by looking at the collaborations that concerned all topics (i.e. by selecting all topics, with the AND operator), which are user 4, 10, 64, 468, and 857. Leapfrogging to this selection of users (see Fig. 12), we can have a deeper look at their mutual efforts. Intensity and homogeneity are very high (0.76/0.95, against 0.14/0.94 in an unweighted context) which we could expect from opinion leaders. They have worked together homogeneously on all topics, except for topics 0 (Undefined which is marginal) and 8 (Resilient which was

Fig. 11. The two barcharts above help compare the entanglement indices from the weighted network (right) and non-weighted network (left). The comparison emphasizes how considering or not the weights can have a strong impact on reading the relative entanglement indices. As can be seen, all topics are assigned a different entanglement value (except for the topics with extremal values – topics 1 and 5). The balance between entanglement indices does not radically change, but the participation of each topic to the network’s cohesion radically differ.

a concluding debate). Notice from the topic interaction network in Fig. 12 that no interaction between these two topics emerged from leaders – most probably because those topics are indeed marginal. Using the same process, we can now answer Edgeryders’ leaders questions. We may process one topic at a time. Selecting a topic t, we retrieve the subset of users who have participated in t. We may then identify other topics they have mutually participated in (which could be related to the corresponding policy campaign). A variety of facts can be extracted: – topic 3 and topic 7 clearly dominate the mutual efforts of contributors; – closer examination reveals strong ties between topic 1 and 2; – topics 0 and 8 gather a majority of users who have pairwise co-participated as well to other topics; – users who participated to topic 5 developed similar efforts to all other topics. The use case we have just presented thus advocates how weights can be integrated in our framework to offer a finer interpretation of cohesion and entanglement indices. It also highlights how the use of the OR and AND operators between the two networks GA and GB can help to narrow reasoning over the network when the topology is not sufficient to understand its structure.

4

Conclusion and future work

This paper addressed the issue of assessing cohesion in groups from homophily networks mixing entities and attributes into a multiplex view of a bipartite network. Our approach considers splitting the multiplex network into two single-type networks used in conjunction when analyzing the homophily relations between entities. To answer this

Fig. 12. A first selection of all topics (left) have highlighted the five most influential users (middle). Leapfrogging to these users let us understand how they have been mutually collaborating to the different topics (right). Note that the first selection, made using the AND operator, returns the lowest intensity and homogeneity values (0/0) since no pair of users have contributed together to all topics. This underlines the need to leapfrog the selection since we still have 5 users who have contributed to all topics. Notice that except for topics 0 and 8 they have all contributed equally. Notice also the absence of highlighted edge between topic 0 and 8 indicating that no pair of the selected users have both contributed to those topics together.

question, we have defined the entanglement, a notion of how attributes intertwine entities’ edges. We have measures entanglement indices on attributes, together with the homogeneity and intensity indices computed on any subset of entities. These attributes can be used to question the cohesion of a group of entities, where optimal cohesion requires that entities simultaneously involve the exact same attributes, and maximum intensity occurs when entities cover all available attributes. A group of lower or unbalanced entanglement indeed requires more careful analysis, and typically leads to the discovery of subgroups or regions locally showing higher entanglement. An entanglement-based search the networks often leads to the identification of outlier entities that can then be discarded, or on the contrary brought forward to understand the network activity. A close examination of the attribute interaction network also helps the identification of core attributes from which entities form a cohesive unit. The case studies clearly show the relevance of questioning the attribute entanglement of entities to potentially confirm the community structure derived from edge density, for instance. They focused on small size examples for sake of readability. This limitation is but apparent, as using the interaction network occurs after entities have been indexed and grouped. Although a query might return hundreds (or thousands) of entities, we may expect the grouping procedure to form much smaller groups before closer examination occurs. We also suspect that larger samples gather larger attribute sets, typically leading to less tangled attribute interactions and less cohesive entity groups. Our second case study suggests our approach applies to other types of networks modeled using a bipartite graph, namely interaction relations. The initial comparative results encourage us to extend our approach to the study of multivariate networks. Indeed, since the entanglement measurement actually considers a multiplex network of interacting entities A , with attributes B corresponding to families of edges. Our third use case has brought forward the important nuance in taking into account weighted entities interactions. We are exploring possibilities to further extend the ways we can incorporate weights in our model, and then fully embrace the weighted multiplex model, possibly with the help of De Domenico et al.’s formulation [10]. For example, entities of type B may not be equal (some may weigh more than others), and the interaction through a same entity of type B across two different pairs of entities of type A may weigh differently. These are design choices we suspect may depend on the nature and/or on the size of the dataset and the questions our users are seeking answers for. These structures being rather complex to manipulate, the use cases we have shown underline the increase in usability when our approach is embedded in a visual and interactive environment. The interactions we have used enable a quick back-and-forth search in the data, putting users as close as possible to their own questions on the original data. Further studies would cover optimized implementation and performance studies, with comparative results on a larger number of networks and measures. Further work also include examining strategies to automatically identify entity and attribute subsets with optimal (or maximum) homogeneity and/or intensity, suggesting potential areas of interest in the network under study. These problems, however, will inevitably bring us to combinatorial optimization problems, and we may expect to have no choice but to rely on heuristics to avoid typical algorithmic complexity issues.

5

Acknowledgements

We would like to thank the European project FP7 FET ICT-2011.9.1 Emergence by Design (MD) Grant agreement no: 284625.

References 1. S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51):21544–21549, 2009. 2. D. Auber, Y. Chiricota, F. Jourdan, and G. Melanc¸on. Multiscale navigation of small world networks. In IEEE Symposium on Information Visualisation, pages 75–81. IEEE Computer Science Press, 2003. 3. E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. The role of social networks in information diffusion. In 21st international conference on World Wide Web, pages 519–528. ACM, 2012. 4. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. 5. S. P. Borgatti. Two-mode concepts in social network analysis. In R. A. Meyers, editor, Computational Complexity - Theory, Techniques, and Applications, pages 2912–2924. Springer, 2012. 6. S. P. Borgatti and M. G. Everett. Network analysis of 2-mode data. Social networks, 19(3):243–269, 1997. 7. S. P. Borgatti, A. Mehra, D. J. Brass, and G. Labianca. Network analysis in the social sciences. Science, 323(5916):892–895, 2009. 8. A. Buja, D. Cook, and D. F. Swayne. Interactive high-dimensional data visualization. Journal of Computational and Graphical Statistics, 5(1):78–99, 1996. 9. R. Burt and T. Scott. Relation content in multiple networks. Social Science Research, 14:287–308, 1985. 10. M. De Domenico, A. Sol`e-Ribalta, E. Cozzo, M. Kivel¨a, Y. Moreno, M. A. Porter, S. G`omez, and A. Arenas. Mathematical formulation of multi-layer networks. arXiv preprint arXiv:1307.4977 physics.soc-ph, 2013. 11. W. Didimo, G. Liotta, and S. A. Romeo. A graph drawing application to web site traffic analysis. Journal of Graph Algorithms and Applications, 15(2):229–251, 2011. 12. J. Ding and A. Zhou. Nonnegative Matrices, Positive Operators and Applications. World Scientific, Singapore, 2009. 13. D. Easley and J. Kleinberg. Networks in their surrounding contexts. In Networks, Crowds, and Markets - Reasoning About a Highly Connected World, pages 77–106. Cambridge University Press, 2010. 14. M. G. Everett and S. P. Borgatti. Analyzing clique overlap. Connections, 21(1):49–61, 1998. 15. S. Fortunato. Community detection in graphs. Physics Reports, 486(3D5):75–174, 2010. 16. K. Fujimoto, C.-P. Chou, and T. W. Valente. The network autocorrelation model using twomode data: Affiliation exposure and potential bias in the autocorrelation parameter. Social networks, 33(3):231–243, 2011. 17. J.-L. Guillaume and M. Latapy. Bipartite Graphs as Models of Complex Networks, volume 3405 of Lecture Notes in Computer Science, pages 127–139. Springer, 2005. 18. R. Guimera, S. Mossa, A. Turtschi, and L. A. N. Amaral. The worldwide air transportation network: anomalous centrality, community structure, and cities global roles. Proceedings of the National Academy of Sciences of the United States of America, 102(22):7794–7799, 2005.

19. InfoVis 2004 Contest. http://www.cs.umd.edu/hcil/iv04contest/. 20. M. O. Jackson. Social and Economic Networks. Princeton University Press, 2010. 21. A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651–666, 2010. 22. S. Kaski, J. Nikkila, M. Oja, J. Venna, P. Toronen, and E. Castren. Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics, 4(1):48, 2003. 23. W. Ke, K. Borner, and L. Viswanath. Major information visualization authors, papers and topics in the ACM library. In IEEE Symposium on Information Visualization 2004. IEEE, 2004. 24. M. Kivel¨a, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. A. Porter. Multilayer networks. arXiv preprint arXiv:1309.7233, 2013. 25. M. Latapy, C. Magnien, and N. D. Vecchio. Basic notions for the analysis of large two-mode networks. Social Networks, 30(1):31–48, 2008. 26. B. Lee, C. Plaisant, C. S. Parr, J.-D. Fekete, and N. Henry. Task taxonomy for graph visualization. In Proceedings of the 2006 AVI workshop on BEyond time and errors: novel evaluation methods for information visualization, pages 1–5. ACM, 2006. 27. C. F. Manski. Identification of endogenous social effects: The reflection problem. The Review of Economic Studies, 60(3):531–542, 1993. 28. M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1):415–444, 2001. 29. Z. Neal. Identifying statistically significant edges in one-mode projections. Social Network Analysis and Mining, pages 1–10, 2013. 30. M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167– 256, 2003. 31. T. Opsahl. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2):159–167, 2013. 32. R. Peeters. The maximum edge biclique problem is np-complete. Discrete Applied Mathematics, 131(3):651–654, 2003. 33. J. M. Podolny and J. N. Baron. Resources and relationships: Social networks and mobility in the workplace. American sociological review, 62(5):673–693, 1997. 34. B. Renoust, G. Melanc¸on, and M.-L. Viaud. Assessing group cohesion in homophily networks. In Advances in Social Network Analysis and Mining (ASONAM) 2013, pages 149– 155. Niagara Falls, Canada, ACM/IEEE, 2013. 35. B. Renoust, G. Melanc¸on, and M.-L. Viaud. Measuring group cohesion in document collections. In IEEE/WIC/ACM International Conference on Web Intelligence, 2013. 36. G. Robins and M. Alexander. Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1):69–94, 2004. 37. C. R. Shalizi and A. C. Thomas. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2):211–239, 2011. 38. C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 623–656, 1948. 39. The EdgeRyders community. http://edgeryders.eu/. 40. The Internet Movie Database (IMDB). http://www.imdb.com. 41. J. S. Yi, Y. ah Kang, J. T. Stasko, and J. A. Jacko. Toward a deeper understanding of the role of interaction in information visualization. IEEE Transactions on Visualization and Computer Graphics, 13(6):1224–1231, 2007. 42. T. Zhou, J. Ren, M. Medo, and Y. Zhang. Bipartite network projection and personal recommendation. Physical Review E, 76(4):046115, 2007.