Research Article Uncovering Research Topics of Academic

Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2014, Article ID 529842, 14 pages http://dx.doi.org/10.1155/2014/529842

Research Article Uncovering Research Topics of Academic Communities of Scientific Collaboration Network Hongqi Han, Shuo Xu, Jie Gui, Xiaodong Qiao, Lijun Zhu, and Han Zhang Institute of Scientific and Technical Information of China, Beijing 100038, China Correspondence should be addressed to Hongqi Han; [email protected] Received 5 December 2013; Revised 3 April 2014; Accepted 3 April 2014; Published 27 April 2014 Academic Editor: Goreti Marreiros Copyright © 2014 Hongqi Han et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to improve the quality of applications, such as recommendation or retrieval in knowledge-based service system, it is very helpful to uncover research topics of academic communities in scientific collaboration network (SCN). Previous research mainly focuses on network characteristics measurement and community evolution, but it remains largely understudied on how to uncover research topics of each community. This paper proposes a nonjoint approach, consisting of three simple steps: (1) to detect overlapping academic communities in SCN with the clique percolation method, (2) to discover underlying topics and research interests of each researcher with author-topic (AT) model, and (3) to label research topics of each community with top 𝑁 most frequent collaborative topics between members belonging to the community. Extensive experimental results on NIPS (neural information processing systems) dataset show that our simple procedure is feasible and efficient.

1. Introduction Social network (SN) analysis is regarded as a powerful tool to find out social links and network structure of actors [1–9]. Scientific collaboration network (SCN) is a kind of complex SNs of researchers, in which a link between two researchers is established if they coauthored one or more scientific papers [10, 11]. Therefore, it is also called as coauthorship network [10, 12]. Previous studies on SCN [6, 9, 11–16] can be roughly separated into two stages: (1) the first stage mainly focused on how to construct network and how to measure network characteristics with some metrics [10, 14, 17], such as degree distribution, clustering coefficient, and average path length; (2) the second stage paid more attention to network structure analysis, community evolution, and so on [1, 2, 8, 13, 18, 19]. As we all know, most real-world networks contain groups in which nodes are more highly connected to each other than those to the rest of the network [13]. The sets of such nodes are usually called communities, clusters, cohesive groups, or modules [13, 20]. Similar to the real-world networks, SNs also include many communities based on common location, interests, and occupation. As one kind of SNs, it should be no exception for SCN [2, 7, 13, 19, 20]. According to whether or not a node is allowed to be a member of more than

one community, the communities can be further divided into two types: overlapping and nonoverlapping. Most real-world networks are characterized by well-defined statistics of overlapping and nested communities [13]. Although real motivations for a link in the SCN are still not well understood at present, there usually exists one or more than one relationship (such as co-colleagues, advisoradvisee, classmates, coproject, friends, or many others) in the real world if two researchers coauthored some papers. Moreover, in order to follow the frontier research or borrow main ideas from other fields [18], an active researcher might involve multiple fields. Intuitively, it is unreasonable to limit one researcher to belong to only one community. Therefore, it convinces us of the fact that overlapping communities also exist in the SCN. It is increasingly important to detect communities in SN in modern applications, ranging from bioinformatics, enterprise organization management, to bibliometrics [21]. Many approaches have been proposed to detect communities in the SN [22, 23], such as traditional methods based on clustering like k-means and other applications, division algorithms based on hierarchical clustering, modularity-based algorithms, spectral algorithms, dynamic algorithms, statistical inference-based methods, multiresolution methods, and

2 lastly methods to find overlapping communities and other miscellaneous methods [23]. But most existing methods for finding communities just discover the separated sets in networks and ignore the overlapping phenomenon [13, 23]. In fact, it is the first step to identify network structure if one wants to provide a valuable insight into how network function and topology affect each other. In a knowledgebased service system, users may be interested in not only the link structure in a network but also in the reason why they form a community. However, most present methods merely focus on detecting the structures or monitoring evolution of communities. There are few literatures on uncovering research topics of academic communities and providing particular information for searching a group of researchers with similar interests. To the best of our knowledge, only Ichise et al. [18] put forward a method to detect academic communities with topic identification in literature. On closer examination, one can see that the word assignment technique was utilized for obtaining the communities. Unfortunately, the method has the limitation of trust in the keywords. To overcome these problems, the paper proposes a nonjoint approach, which integrates community detection method and author-topic (AT) model. Specifically, it consists of three simple steps: (1) to detect overlapping academic communities in SCN with the clique percolation method, (2) to discover underlying topics and research interests of each researcher with author-topic (AT) model, and (3) to label research topics of each community with top 𝑁 most frequent collaborative topics between members belonging to the community, where common topics between researchers are seen as collaborative topics. The remainder of the paper is organized as follows. Section 2 provides related works on community detection models and topic models. Section 3 illustrates the analysis framework in the study and then introduces each unit of the framework. Section 4 describes and discusses experimental results. Finally, the conclusion is made.

2. Related Works 2.1. Community Detection Methods. Community detection is the organization of nodes in a network into subsets of nodes such that nodes within a subset are more densely connected internally than those within the other subsets. Another way to say this from a graph theoretic perspective is that, given a graph 𝐺(𝐴, 𝐸) with a set 𝐴 of nodes and a set 𝐸 of edges, community detection is to classify the node set 𝐴 into multiple subsets 𝐶 = {𝑐𝑖 }|𝐶| 𝑖=1 with 𝑐𝑖 ⊆ 𝐴, such that nodes belonging to a subset 𝑐𝑖 are all closely related [24]. Here | ⋅ | denotes the number of the elements in a set. Because the number of communities underlying a network is typically unknown in advance and the sizes or densities of communities are often uneven, it is not trivial to find automatically community structure. Several community detection approaches have been developed and employed with varying levels of success [25], including hierarchical clustering algorithm, Girvan-Newman algorithm [26], modularity maximization algorithm [27], and clique-based

International Journal of Distributed Sensor Networks methods [13]. It is worth noting that only the last one can deal with the overlapping phenomenon. Hierarchical clustering is a simple algorithm which employs some type of similarity metrics between node pairs to group similar nodes into communities. Girvan-Newman algorithm identifies edges that lie between communities and then removes them, just leaving behind the communities themselves. Though Girvan-Newman is popular in a number of standard software packages, its time complexity is O(|𝐸|2 × |𝐴|), making it impractical for networks with more than a few thousand nodes [26]. Modularity maximization algorithm defines a benefit function to measure the quality of a particular division of a network into communities [27], and can be used for large-scale network problem. However, since approximate optimization is utilized in modularity maximization algorithm, it often fails to detect clusters smaller than some scale depending on the size of network. Clique-based methods build up the communities from the cliques in a network [28]. By clique, we mean the complete subgraphs in a network that are not parts of larger complete subgraphs. Specifically, the general procedure of the methods is to find cliques firstly and then to unite the cliques bigger than a minimum number of nodes to define a subgraph of original network, and finally components (disconnected parts) of the defined subgraph are used to define communities [22, 23]. The alternative of the method is to use 𝑘-cliques, which are complete subgraphs with 𝑘 nodes, to construct line graph known as clique graph [20, 28]. In fact, clique graph is a hypergraph of original graph, the nodes of which are 𝑘-cliques, and the edges of which record the overlap of the cliques in the original graph. The difference between 𝑘-cliques and cliques is that 𝑘-cliques can become subsets of larger complete subgraphs. A typical approach based on 𝑘-cliques is the clique percolation method (CPM) [13, 29], which defines communities as percolation clusters of 𝑘cliques. CPM algorithm runs in the time O(𝛼𝑛𝛽 ln |𝐴| ), where 𝛼 and 𝛽 are constant values [30]. 2.2. Topic Models. Topic models are a family of statistical models for discovering a mixture of “components” in a collection of documents [31]. In these models, each topic is modeled as a probability distribution over words in the vocabulary of corpus and each document in corpus is modeled as a mixture of topics given by a multinomial distribution over the topics [32]. An early topic model called probabilistic latent semantic indexing (pLSI) was proposed by Hofmann [33]. While Hofmann’s work is a useful step toward probabilistic modeling of text, it is incomplete in the fact that it provides no probabilistic model at the level of documents. In order to overcome this problem, Blei and his coworkers developed latent Dirichlet allocation (LDA) model [34]. LDA is similar to pLSI, except that in LDA model the topic distribution is assumed to have a Dirichlet prior. In practice, the assumption usually brings about more reasonable mixtures of topics in a document. Subsequent topic models, such as author-topic (AT) model [35], topic over time (ToT) model [36], authortopic over time (AToT) model [31, 32, 37], and conferenceauthor-relation topic (CART) model [38], are generally extensions on LDA.

International Journal of Distributed Sensor Networks

3

t1

t2

···

t|T|

a1

a1

AT

a2

model

···

a2 ···

a|A|

a|A|

a1

a2

communities

To discover collaborative topics between authors

c1

···

a|A|

To uncover topics of academic

To preprocess data Words

Papers Authors

Topics

c2 ··· c|C|

···

Coauthorship

Network

Cliques

Communities

To discover communities in scientific collaborative network

Figure 1: Framework of the method.

As a famous topic model, LDA [34] is a generative probabilistic model for collections of discrete data such as text corpora [39]. LDA model is based upon the idea that the probability distribution over words in a document can be expressed as a mixture of topics. It means that each document may be viewed as a mixture of various topics. LDA model can be viewed as a generative process. A document can be generated in following three steps: (1) to sample a mixture proportion from a Dirichlet distribution, (2) to sample a topic index according to the mixture proportion for each word in the document, and (3) to sample a word token from a multinomial distribution over words specific to the sampled topic. AT is also a generative model that extends LDA model to include authorship information [40, 41]. The model provides a relatively simple probabilistic model for exploring the relationships between authors, documents, topics, and words. In the model, each author is represented by a multinomial distribution over topics and each topic is represented by a multinomial distribution over words. The words in a document coauthored by multiauthors are assumed to be the result of a mixture of topic mixture of each author. Then, the topic-word and author-topic distributions are learned from text corpus. Compared with LDA, AT can give the increase of salient topics and more reasonable researchers interest

patterns [40]. AT model has been proved to be an essential way to uncover the research interests of each researcher [40, 41].

3. Method The analysis framework of proposed approach is illustrated in Figure 1. The framework is composed of four parts: to preprocess data, to detect communities in scientific collaboration network, to discover collaborative topics between authors, and to uncover topics of academic communities. We describe each part in detail in the following subsections. 3.1. To Preprocess Data. In this part, words and authors are extracted from papers collected. First word terms are extracted to build vocabulary and all stop-words are eliminated. Then word frequency and inverse document frequency of each word in vocabulary are computed for following author-topic model. Next, author names are extracted and the name disambiguation algorithm is used to process ambiguous names, such as an author with multiple names or multiple authors with the same name, and then all author names are normalized to a standard name and assigned a unique ID number.

4


3.2. To Detect Communities in SCN. From a “topological” point of view, network can be divided into four categories: undirected binary network, directed binary network, weighted directed network, and weighted undirected network [10, 14]. In the part, a SCN is first created by following the principle of undirected binary network, in which each node represents an author and each edge represents the coauthorship between two linked authors. Specifically, if two authors coauthored one paper at least, an edge with unit weight will be created. In other words, no matter how many papers two authors coauthored, there is only one edge between them. For example, if 𝑎1 , 𝑎2 , and 𝑎3 coauthored one paper and 𝑎1 and 𝑎2 coauthored another paper, three edges will be created, that is, 𝑒12 , 𝑒13 , and 𝑒23 , in the constructed network. Then, cliques are extracted from constructed SCN and communities are detected with 𝑘-clique-community detection algorithm [13]. The community definition in the algorithm is based on the observation that a typical node in a community is linked to many other nodes, yet not necessarily to all other nodes. A 𝑘-clique-community is the union of all 𝑘-cliques that can be reached from each other through a series of adjacent 𝑘-cliques, where adjacency means sharing 𝑘 − 1 nodes. When 𝑘 = 2, the 𝑘-clique communities are equivalent to the connected subgraphs, which are also called components in complex network analysis. 3.3. To Discover Collaborative Topics between Authors. Here, AT model is used to uncover the research interest of each author. The graphical model representation for AT model is shown in Figure 2. The following notations are used in this study. Let 𝑃 and 𝑊 be the set of papers and unique words in the corpus, respectively. For each 𝑚 ∈ {1, 2, . . . , |𝑃|}, 𝑊𝑚 or 𝑎𝑚 are denoted by all the word tokens or the set of authors in the paper 𝑚 and |𝑊𝑚 | means the length of the paper 𝑚. 𝜗𝑎 or 𝜑𝑙 are multinomial distribution of topics or words specific to the author 𝑎 or the paper 𝑚. For more elaborate and detailed descriptions on AT model we refer the readers to [35, 40, 41]. In this work, collapsed Gibbs sampling algorithm, which runs over the three periods, initialization, burn-in, and sampling with 𝐿 iterations in total, is used for inference |𝑊| on {𝜗𝑎 }|𝐴| 𝑎=1 or {𝜑𝑙 }𝑙=1 , since it provides a simple method for obtaining parameter estimates under Dirichlet priors. The time complexity of the AT model is as follows: |𝑃|

󵄨 󵄨 󵄨 󵄨 O (𝐿 × ∑ (󵄨󵄨󵄨𝑊𝑚 󵄨󵄨󵄨 × 󵄨󵄨󵄨𝑎𝑚 󵄨󵄨󵄨)) .

(1)

𝑚=1

In our paper, each topic is represented with the top 10 words most likely to be generated conditioned on the topic, and research interest of each author is represented with the top 10 most likely topics. From the results of AT model, we can build the relation matrix between authors and topics. Each element of the matrix is the association probability between an author and a topic. With the matrix, it is easy to get collaborative topics between any two authors with coauthorship. Formally, for each 𝑖 ∈ {1, 2, . . . , |𝐴|}, let 𝑇𝑖 = {𝑡𝑖1 , 𝑡𝑖2 , . . . , 𝑡𝑖10 } be the research topics of author 𝑖. Then the

am

xm,n

𝛼

𝜗a

zm,n

𝛼 ∈ [1, |A|]

𝛽

𝜑l l ∈ [1, |W|]

wm,n n ∈ [1, |Wm |] m ∈ [1, |P|]

Figure 2: The graphical model representation of the author-topic model.

collaborative topics 𝑇𝑖𝑗 between the authors 𝑖 and 𝑗 are defined as the intersection of 𝑇𝑖 and 𝑇𝑗 ; that is, 𝑇𝑖𝑗 = 𝑇𝑖 ∩ 𝑇𝑗 . For example, if 𝑇𝑖 = {𝑡1 , 𝑡3 , 𝑡4 , 𝑡5 , 𝑡10 , 𝑡12 , 𝑡14 , 𝑡15 , 𝑡16 , 𝑡17 } and 𝑇𝑗 = {𝑡2 , 𝑡4 , 𝑡5 , 𝑡6 , 𝑡7 , 𝑡8 , 𝑡9 , 𝑡11 , 𝑡19 , 𝑡20 }, then the collaborative topics between the two authors are {𝑡4 , 𝑡5 }. 3.4. To Uncover Topics of Academic Communities. In Section 3.2, we have detected communities in SCN. In Section 3.3, we have got topics of authors and we have also obtained the collaborative topics between any two collaborated authors in detected communities. Here, we will integrate both results to uncover the topics of academic communities by ranking topics and selecting the most frequently collaborated ones. For a SCN, 𝐺 = {𝐴, 𝐸}, we denote the set of nodes as 𝐴 = {𝑎1 , 𝑎2 , . . . , 𝑎|𝐴| }, the set of edges as 𝐸 = {𝑒𝑖𝑗 | 𝑎𝑖 ∈ 𝐴, 𝑎𝑗 ∈ 𝐴, co-authored (𝑎𝑖 , 𝑎𝑗 )}, the set of topics found as 𝑇 = {𝑡1 , 𝑡2 , . . . , 𝑡|𝑇| }, and the set of 𝑘-clique communities detected as 𝐶 = {𝑐1 , 𝑐2 , . . . , 𝑐|𝐶| }. For each community 𝑐𝑚 , we can define a subgraph of 𝐺, 𝐺𝑚 = {𝐴 𝑚 , 𝐸𝑚 }, where 𝐴 𝑚 = {𝑎𝑖 | 𝑎𝑖 ∈ 𝑐𝑚 } and 𝐸𝑚 = {𝑒𝑖𝑗 | 𝑒𝑖𝑗 ∈ 𝐸, 𝑎𝑖 ∈ 𝑐𝑚 , 𝑎𝑗 ∈ 𝑐𝑚 }. In Section 3.3, we have obtained collaborative topics for each edge in 𝐸𝑚 ; therefore we can compute the collaborative frequency of all topics in the community 𝑐𝑚 through counting topics by edges according to 𝑓 (𝑡𝑙 ) = ∑ 𝛿 (𝑡𝑙 ∈ 𝑇𝑖𝑗 ) , 𝑒𝑖𝑗 ∈𝐸𝑚

𝑙 = 1, 2, . . . , |𝑇| ,

(2)

where the indicator function 𝛿(𝑥) = 1 if 𝑥 is true and 0 otherwise.


5 Table 1: Distribution of the number of author pairs over collaborative numbers in NIPS dataset.

a2

a1

a3

Collaborative numbers 1 2 3 4 5 6 9 Total

a4

Number of author pairs 2707 299 93 18 3 9 1 3130

Figure 3: An example of community.

Once we have got collaborative frequencies of all topics in community 𝑐𝑚 , we rank them by sorting frequency descendingly and then select the top 𝑁 topics as the research topics of the community. To illustrate the process clearly, let us take a simple example in Figure 3 with a community consisting of four nodes. Given author topics as follows: 𝑇1 = {𝑡1 ,𝑡2 , 𝑡10 , 𝑡11 𝑡12 , 𝑡13 𝑡14 , 𝑡15 𝑡16 , 𝑡17 }, 𝑇2 = {𝑡1 , 𝑡2 , 𝑡3 , 𝑡4 𝑡18 , 𝑡19 𝑡20 , 𝑡21 𝑡22 , 𝑡23 }, 𝑇3 = {𝑡1 , 𝑡4 , 𝑡5 , 𝑡24 , 𝑡25 , 𝑡26 , 𝑡27 , 𝑡28 , 𝑡29 , 𝑡30 }, and 𝑇4 = {𝑡3 , 𝑡4 , 𝑡5 , 𝑡31 , 𝑡32 , 𝑡33 , 𝑡34 , 𝑡35 , 𝑡36 , 𝑡37 }, then, the collaborative topics between authors are 𝑇12 = {𝑡1 , 𝑡2 }, 𝑇13 = {𝑡1 }, 𝑇23 = {𝑡1 , 𝑡4 }, 𝑇24 = {𝑡3 , 𝑡4 }, and 𝑇34 = {𝑡4 , 𝑡5 }. Using (2), we can easily get the frequencies of all topics: 𝑓 (𝑡1 ) = 3,

𝑓 (𝑡2 ) = 1,

𝑓 (𝑡4 ) = 3,

𝑓 (𝑡5 ) = 1.

𝑓 (𝑡3 ) = 1,

(3)

Finally, if we rank the frequencies and select top 2 topics as research topics of the community, the result is {𝑡1 , 𝑡4 }.

4. Experimental Results and Discussion 4.1. Data. NIPS proceeding dataset is utilized to evaluate the performance of proposed framework, which consists of the full text of the 13 years of proceedings from 1987 to 1999 Neural Information Processing Systems (NIPS) Conferences (http://www.cs.toronto.edu/∼roweis/data.html). The dataset contains 1,740 research papers and 2,037 unique authors. Because all the author names have been processed and normalized, we need not run name disambiguation algorithm in the step of preprocessing data. Based on coauthorship, we count the collaborative numbers between coauthored researchers. The distribution of the numbers of author pairs over collaborative numbers is shown in Table 1. It shows that the maximum collaborative number between authors is 9 corresponding to author pair (Smola A ID: 1475 and Scholkopf B ID: 1504). In addition to downcasing and removing stop-words and numbers, we also remove the words appearing less than five times in the corpus. After the preprocessing, the dataset

Figure 4: NIPS scientific collaboration network.

contains 13,649 unique words and 2,301,375 word tokens in total. In our experiments of AT model, the number of topics is fixed at 100, the symmetric Dirichlet priors 𝛼 and 𝛽 are set at 0.5 and 0.1, and Gibbs sampling is run for 𝐿 = 2000 iterations. 4.2. Scientific Collaboration Network. Based on coauthorship, we construct the scientific collaboration network containing 1897 nodes and 3130 edges. That is to say, there are 140 authors who did not collaborate with any other authors. The constructed network graph is shown in Figure 4. From Figure 4, we find that the NIPS network is composed of a larger subgraph (in the center of the picture) and many smaller subgraphs. 4.3. Component Analysis. Using component analysis approach [42] on the network, we found 235 components totally. That is to say, the network contains 235 separated subgraphs. The number of author nodes in the top 10 components is 1061, 37, 27, 22, 19, 15, 11, 10, 10, and 9, respectively. We select the largest one (1061 authors) as analysis object in the following experiments. Figure 5 shows the graph of the largest component. 4.4. Cliques. The 𝑘-clique-community detection algorithm in NetworkX tools [42] is used to discover all cliques in the network. The size and number of cliques in the network and the largest component are shown in Table 2. The size of the largest clique in the network is 10, which means that

6

International Journal of Distributed Sensor Networks Table 2: Sizes and numbers of cliques.

2 Numbers in network Numbers in the largest component

3

Size of cliques 4 5 6 7 8 9 10

508 348 137 42 26 6 2 3

1

316

0

244

94

32

21

4

1

2

Figure 5: The largest subgraph of NIPS network.

it contains 10 authors, while the size of largest clique in the largest component is 9. 4.5. Community Analysis. The detected communities depend on the value of parameter 𝑘, where 𝑘 refers to the size of cliques. Typically, the value of 𝑘 is between 3 and 6 [13]. Increasing 𝑘 makes the communities smaller and more disintegrated, but, at the same time, also more cohesive [13]. Different value of 𝑘 will give rather different results and thus give us flexibility when providing research community service for users in knowledge-based system. The cliques found in Section 4.4 are used to detect communities. Communities detected under different 𝑘 value are presented as follows. (a) 𝑘 = 2. If 𝑘 = 2, communities detected will be the largest component of the network shown in Figure 5 (note that we just use the largest component to do the community analysis).

(b) 𝑘 = 3. Setting 𝑘 = 3, we obtain 47 communities and 22 overlapping nodes. Due to space constraints, we just list the author IDs of all overlapping nodes: 18, 42, 53, 63, 77, 117, 156, 205, 206, 370, 383, 390, 459, 578, 673, 697, 733, 811, 943, 1039, 1212, and 1276. The corresponding results for all the 47 communities are available from the authors upon request. (c) 𝑘 = 4. Setting 𝑘 = 4, we obtain 18 communities shown in Tables 3 and 6 overlapping nodes: 77, 156, 383, 726, 1475, and 1504, which are underlined in Table 3. Their names are provided in Table 6. Comparing with the results of

overlapping nodes on 𝑘 = 3, authors 77, 156, and 383 are still the overlapping nodes; however, other 20 nodes when 𝑘 = 3 are not overlapping nodes when 𝑘 = 4; therefore the three authors maybe played more important role in interaction between communities or they have more diverse research interests. (d) 𝑘 = 5. Setting 𝑘 = 5, we obtain 6 communities shown in Table 4, and again the IDs of overlapping nodes are underlined. There is only one overlapping node left which is a member of three communities, that is, communities 4, 5, and 6, and the author ID of the node is 156. In our experiment, the author (name of whom is Sejnowski T) is the only overlapping nodes when 𝑘 = 3, 4, and 5. He maybe has more diverse research interests so that he can play “bridge” role in multiple communities. The subgraphs of the 6 communities (𝑘 = 5) in NIPS network are shown in Figure 6. In Figure 6, nodes in the same community are painted with the same color and shape except the overlapping node. The overlapping node is painted with red color and circle shape. From Figures 6(a)–6(c), one can see that communities 1 and 3 are connected but they do not have overlapping nodes, community 2 is separated from other communities, and communities 4, 5, and 6 are connected by overlapping node 156. 4.6. Topics and Collaborative Topics between Authors. By running AT model on NIPS dataset, we obtain 100 topics and assign each topic an ID number range from 0 to 99. In Table 5, we list some typical topics and top 10 hot words for each topic with corresponding probabilities. Among them, there are some representative domains in NIPS in the listed topics, including support vector machine (SVM) and kernel methods, neural network, speech recognition, image and vision, EM and mixture model, and independent component analysis (ICA). After running AT model on NIPS dataset, we also obtain research topics of each author. We select the 10 most likely topics as research topics for each author, and each topic has a probability value indicating the possibility that the author is related to the topic. Table 6 shows research topics of the authors corresponding to the overlapping nodes in communities under the condition 𝑘 = 4. In the column “Topics ID2 and probabilities” of Table 6, the decimal value in parentheses after each topic is the probability specific to the author. For example, “19(0.18351)” denotes the probability author Koch C specific to topic 19 is 0.18351. From the results of authors’ research topics, we can easily obtain the collaborative topics between two collaborated authors by finding their common topics. In the results of AT model, we got the probabilities between authors and topics. If two authors collaborated a topic, we use the smaller value of probabilities to express the possibility of the topic they are related to. For example, Vapnik V (726) and Smola A collaborated in topic 77, and the probability values they are related to the topic are 0.26370 and 0.33208, respectively, so the probability they collaborated in the topic is 0.26370. In Table 3, we have listed all authors in community 2 (𝑘 = 4).


(a) Communities 1 and 3

7

(b) Community 2

(c) Communities 4, 5, and 6

Figure 6: Subgraph of discovered communities (𝑘 = 5).

Table 3: Detected communities (𝑘 = 4). Overlapping nodes are bold. CNo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Members 37, 42, 94, 95, 96, 194, 235, 236, 237, 238, 239, 240, 404, 432, 611, 726, 728, 780, 921, 922, 1017, 1077, 1078 197, 726, 729, 1473, 1474, 1475, 1504, 1697, 1826, 1827, 1828, 1830, 1952 63, 156, 376, 430, 555, 719, 961, 1554 276, 370, 510, 577, 681, 682, 683, 912, 916 77, 79, 1094, 1219, 1220, 1221, 1552 156, 418, 1298, 1299, 1556, 1717, 1718, 1719, 1776, 1777, 1778 300, 880, 989, 990, 1119, 1253 383, 748, 749, 750, 861 383, 390, 691, 692, 913, 914 117, 630, 798, 811, 1475, 1504 156, 381, 986, 1351, 1395, 1396, 1397, 2009 116, 784, 785, 786, 933 77, 78, 313, 460, 461, 462, 536, 942, 944 40, 205, 399, 507, 508, 686, 687, 688, 689, 690 41, 44, 970, 971, 972, 973, 1048 53, 148, 149, 150, 401, 704, 1068 805, 918, 919, 920, 1239 2, 3, 179, 374, 642, 643, 945

CNo is the abbreviation of community number.

Size 23 13 8 9 7 11 6 5 6 6 8 5 9 10 7 7 5 7

8

International Journal of Distributed Sensor Networks Table 4: Detected communities (𝑘 = 5). Overlapping nodes are bold.

CNo 1 2 3 4 5 6

Members 37, 42, 94, 96, 194, 235, 236, 237, 238, 239, 240, 404, 432, 611, 726, 780, 1017 276, 370, 510, 577, 681, 682, 683, 912, 916 1475, 1504, 1697, 1826, 1827, 1828, 1830, 1952 156, 376, 430, 555, 719, 961, 1554 156, 418, 1298, 1299, 1556, 1717, 1718, 1719 156, 381, 986, 1351, 1395, 1396, 1397, 2009

Size 17 9 8 7 8 8

CNo is the abbreviation of community number.

In the community, there are three overlapping nodes, 726, 1475, and 1504. In Table 6, we have listed the hot topics of the three authors. In this way, we can obtain a symmetric matrix of collaborative topics between them. In each cell except the diagonal cells, the collaborative topics are the intersection of topics of corresponding two authors. Collaborative topics between them are reported in Table 7. 4.7. Community Topics. In this subsection, we will integrate the results from Sections 4.5 and 4.6 to uncover the research topics of communities. Our main idea is to find out the most frequent and possible collaborative topics in each community. To be specific, for each community, we count all collaborated topics in all edges of the subgraph using (1) and then rank them by sorting both collaborative frequencies and probabilities descendingly. We use the minimal collaborative probabilities as probabilities for each topic in the counting process. We show the result of community 1 (𝑘 = 5; see Table 4) in Table 8. If two topics have the same frequencies, the topic with larger probability is ranked before another one. With ranked collaborative topics of communities, we can select the most outstanding topics or select top 𝑁 topics as research topics for each community. In this paper, we use top 𝑁 topics to represent the research interests for all detected communities. Here, we set 𝑁 = 3 and list the topics of all communities under the condition 𝑘 = 5 (shown in Table 9). In column 2 of Table 9, topics are split by comma, and each topic is listed as the sequence of topic ID, topic name, and collaborative frequency. According to the results, we can speculate the research interests for each community. It seems that the main interest of community 1 is related to “pattern recognition” and “outlier detection,” community 2 “speech recognition,” community 3 “SVM & kernel method,” community 4 “EM & mixture model” and “neural network and Boltzmann machine,” community 5 “independent component analysis” and EEC, and community 6 “image and vision” and “independent component analysis.” In order to investigate the effectiveness of proposed method, we check all the papers collaborated by authors in each detected community in Table 4. We found that the uncovered topics for most communities are closely related to topics of the papers written by the community authors. For example, there are 16 papers collaborated by authors in community 3 (𝑘 = 5), and the topic of the community seems

to be “support vector machine and kernel method” according to the results of proposed method. We have listed the titles of these papers in Appendix A. We found that there are 8 papers, the titles of which contain “support vector,” and 4 papers, the titles of which contain “kernel.” Although there exist other topics judging from the titles of rest papers, it still assures us that the research interests of the community are mainly related to “SVM and kernel method.” Finally, we examine the function of the overlapping nodes. In our experiments, when 𝑘 = 5, we have an overlapping node, 156 (author name: Sejnowski T), between communities 4, 5, and 6. It is interesting that all the uncovered topics for the three communities contain topic 24 (visual position), and topics of both communities 5 and 6 contain topic 89 (independent component analysis). In order to find whether or not overlapping nodes play a bridge role between the three communities, we check all the 43 papers of author 156. The titles of his papers are provided in Appendix B. We found that the research interests of Sejnowski T covered almost all found topics of the three communities. From the titles of the papers, it is not difficult to see that there are more than 12 papers covering topic 24, about 6 papers covering topic 89, and about 3 papers covering topic 92. Thus, we have reason to believe that author 156 has multiple research interests and so plays a bridge role between the three communities.

5. Conclusion In this work, a method of uncovering research topics of communities in scientific collaboration network is proposed. The method integrates community detection model using 𝑘-clique-community algorithm and the author-topic model. The approach of 𝑘-clique-community algorithm is to detect overlapping communities in scientific collaboration network, while the approach of AT model is to discover topics and authors’ topics. We use common topics of coauthored researchers as their collaborative topics. Finally, we count all collaborative topics and select the most frequent collaborated topics among authors as research topics of communities. Experimental results on NIPS dataset show that our method is feasible and efficient. In a knowledge-based system, it will be useful after obtaining the information of communities and their research topics. This information will help users position aninteresting academic community quickly, and then they can be led to


9

Table 5: Some typical research topics in NIPS. Topic ID and name

15 Image and vision

24 Visual position

26 Speech recognition

31 Neural network and Boltzmann machine

33 EEG

43 Pattern recognition and distance transformation

52 Outlier and noise characteristics

74 Neural network

77 SVM and kernel method

80 Network input, output, and architecture

Image Images Feature Features Pixel Sejnowski Visual Basis Figure Position Speech Recognition System HMM Context Hinton Hidden Features Dayan Recognition Time EEG Sound Localization Auditory Distance Tangent Pattern Patterns Rate Case Number Large Random Order Network Neural Networks Systems Paper Kernel Support Vector SVM Set Input Output Layer Hidden Network

Hot words and probabilities 0.09447 System 0.05248 Figure 0.02905 Pixels 0.02246 Vision 0.01761 Scale 0.02215 Information 0.02123 Local 0.01916 Representations 0.01194 Representation 0.01050 Song 0.06291 Speaker 0.04250 Training 0.02914 Word 0.02458 Continuous 0.02144 Acoustic 0.01972 Visible 0.00874 Weights 0.00797 Distribution 0.00785 Single 0.00781 Energy 0.02030 Brain 0.02013 Data 0.01650 Location 0.01346 Activity 0.01287 Components 0.02642 Transformations 0.02514 Set 0.01317 Recognition 0.01246 Cun 0.01012 Vectors 0.02556 Values 0.01751 Results 0.01641 Simple 0.01606 Small 0.01455 General 0.24201 Figure 0.17782 Artificial 0.15251 Work 0.01321 Shown 0.01106 Information 0.03324 Margin 0.02276 Data 0.02172 Space 0.01474 Solution 0.01433 Regression 0.10359 Weights 0.08380 Training 0.05942 Net 0.05890 Architecture 0.04064 Inputs

0.01731 0.01392 0.01356 0.01305 0.01215 0.01027 0.00924 0.00884 0.00884 0.00878 0.01774 0.01395 0.01321 0.01158 0.01099 0.00739 0.00696 0.00692 0.00685 0.00681 0.01217 0.01083 0.01053 0.00983 0.00960 0.00994 0.00994 0.00981 0.00959 0.00932 0.01432 0.01370 0.01341 0.01335 0.01100 0.01040 0.00852 0.00761 0.00760 0.00669 0.01433 0.01253 0.01178 0.01048 0.01036 0.03965 0.03359 0.03012 0.02245 0.02208

10

International Journal of Distributed Sensor Networks Table 5: Continued.

Topic ID and name Information Independent Source Separation Sources Data Probability Likelihood Mixture Density Training Set Data Performance Test

89 Independent component analysis

92 EM and mixture model

97 Performance evaluation

Hot words and probabilities 0.02653 Matrix 0.02629 Blind 0.01889 Component 0.01830 Natural 0.01718 ICA 0.05575 Distribution 0.03424 Log 0.02770 EM 0.02502 Parameters 0.02359 Gaussian 0.06981 Results 0.06098 Number 0.05007 Error 0.04155 Table 0.03330 Problem

0.01698 0.01519 0.01384 0.01336 0.01333 0.02129 0.02112 0.02030 0.01948 0.01715 0.02668 0.02396 0.01849 0.01663 0.01387

Table 6: Author topics of overlapping nodes (𝑘 = 4). AID

Author

77

Koch C

156

Sejnowski T

383

Kawato M

726

Vapnik V

1475

Smola A

1504

Scholkopf B

Topic IDs and probabilities 19 (0.18351), 53 (0.09956), 66 (0.08539), 50 (0.06150), 67 (0.06092), 84 (0.05648), 52 (0.04997), 29 (0.02542), 27 (0.02443), 13 (0.02352) 24 (0.15232), 27 (0.06548), 66 (0.05343), 12 (0.04476), 33 (0.04087), 89 (0.03909), 52 (0.03130), 92 (0.03060), 15 (0.02934), 67 (0.02900) 58 (0.47440), 96 (0.07180), 12 (0.05519), 74 (0.05178), 91 (0.02970), 26 (0.02424), 56 (0.02287), 80 (0.02082), 52 (0.01764), 42 (0.01240) 77 (0.26370), 97 (0.09446), 44 (0.06308), 98 (0.05889), 43 (0.05638), 42 (0.04341), 52 (0.03902), 70 (0.03525), 21 (0.02877), 92 (0.02667) 77 (0.33208), 98 (0.10307), 44 (0.08248), 97 (0.05777), 52 (0.05617), 43 (0.05022), 59 (0.03558), 90 (0.02437), 42 (0.02414), 12 (0.01682) 77 (0.45924), 98 (0.05840), 52 (0.05196), 97 (0.05168), 44 (0.04020), 43 (0.03431), 90 (0.02675), 59 (0.02423), 42 (0.01919), 12 (0.01863)

AID is the abbreviation of author ID.

Table 7: Collaborative topics between authors 726, 1475, and 1504. Author ID 1475

1504

726 77 (0.26370), 44 (0.06308), 98 (0.05889), 97 (0.05777), 43 (0.05022), 52 (0.03902), 42 (0.02414)

1475

77 (0.33208), 98 (0.05840), 77 (0.26370), 98 (0.05840), 52 (0.05196), 97 (0.05168), 97 (0.05168), 44 (0.04020), 44 (0.04020), 43 (0.03431), 52 (0.03902), 43 (0.03431), 90 (0.02437), 59 (0.02423), 42 (0.01919) 42 (0.01919), 12 (0.01682)

find interesting topics, researchers, and papers by using the topics and coauthorship of authors in the community. This

information will also help us improve the application effect and user experience in academic recommendation system and provide researchers in a community with information they really need. Therefore, the more interesting problems related to the method are how to use it and its results in a knowledge-based system and how the parameter 𝑘 in 𝑘-clique-community algorithm affects user selection in the practical application. There are some challenge problems for future studies. One is to develop an algorithm to obtain collaborative topics between authors directly by extending AT model. Another one is to analyze topic evolution of communities and the functions of overlapping nodes in the evolution process.


11

Table 8: Collaborative topics of community 1 in Table 4. TID 43 97 52 80 86 74 81 70 15 42 77 19 6 12 75 44 64 92

Collaborative topic name Pattern recognition and distance transformation Performance evaluation Outlier and noise characteristics Network input, output, and architecture Character recognition Neural network Neuronic network Supervised learning Image and vision Optimization algorithms SVM and kernel method Analog circuit Parallel processing Rule learning Network unit and connection weight Linear and nonlinear programming Routing EM and mixture model

Freq. 76 72 68 65 57 47 41 24 18 14 11 9 9 9 4 3 2 1

Probability 0.08673 0.04684 0.05536 0.02443 0.04598 0.02801 0.04362 0.02891 0.02528 0.03183 0.03183 0.03957 0.03528 0.02367 0.02446 0.02641 0.05357 0.02667

TID is the abbreviation of topic ID. Freq. is the abbreviation of frequency.

Table 9: Community topics (𝑘 = 5 and 𝑁 = 3). CNo 1 2 3 4 5 6

TID 43 26 77 92 24 24

Top 1 Freq. 76 28 25 14 25 24

TID 97 97 97 31 89 15

Top 2 Freq. 72 28 25 12 25 24

TID 52 80 52 24 33 89

Top 3 Freq. 68 20 25 9 25 14

CNo is the abbreviation of community number. TID is the abbreviation of topic ID. Freq. is the abbreviation of frequency.

“Kernel PCA and Denoising in feature spaces” “Regularizing AdaBoost” “Semiparametric support vector and linear programming machines” “The entropy regularization information criterion” “Transductive inference for estimating values of functions” “Invariant feature extraction and classification in kernel spaces” “V-Arc: ensemble learning in the presence of outliers” “Support vector method for novelty detection”

Appendices A. Titles of Papers Collaborated by Authors in Community 3 Detected When 𝑘 = 5

“Unmixing hyperspectral data.”

B. Titles of the Papers Written by Author 156

“Support vector regression machines”

“A “neural” network that learns to play backgammon”

“Support vector method for function approximation, regression estimation, and signal processing”

“Storing covariance by the associative long-term potentiation and depression of synaptic strengths in the hippocampus”

“Improving the accuracy and speed of support vector machines” “From regularization operators to support vector kernels”

“Neural network analysis of distributed representations of dynamical sensory-motor transformations in the leech”

“Prior knowledge in support vector kernels”

“Combining visual and acoustic speech signals with a neural network improves intelligibility”

“Analysis of drifting dynamics with neural network hidden Markov models”

“SEXNET: a neural network identifies sex from human faces.”

“Shrinking the tube: a new support vector regression algorithm”

“Recurrent eye tracking network using a distributed representation of image motion”

12

International Journal of Distributed Sensor Networks “Hierarchical transformation of space in the visual system”

“Viewpoint invariant face recognition using independent component analysis and attractor networks”

“Neural network analysis of event-related potentials and electroencephalogram predicts vigilance”

“Edges are the “independent components” of natural scenes”

“Competitive anti-hebbian learning of invariants”

“Selective integration: a model for disparity estimation”

“Filter selection model for generating visual motion signals”

“Learning decision theoretic utilities through reinforcement learning”

“Unsupervised discrimination of clustered data via optimization of binary information gain”

“Learning nonlinear overcomplete representations for efficient coding”

“Biologically plausible local learning rules for the adaptation of the vestibulo-ocular reflex”

“Extended ICA removes artifacts from electroencephalographic recordings”

“Using aperiodic reinforcement for directed selforganization during development”

“Analyzing and visualizing single-trial event-related potentials”

“Foraging in an uncertain environment using predictive hebbian learning”

“Unsupervised classification with non-Gaussian mixture models using ICA”

“Temporal difference learning of position evaluation in the game of go”

“Coding time-varying signals using sparse, shiftinvariant representations”

“A novel reinforcement model of birdsong vocalization learning”

“Predictive sequence learning in recurrent neocortical circuits”

“Reinforcement learning predicts the site of plasticity for auditory remapping in the barn owl”

“Image representations for facial expression coding.”

“Spatial representations in the parietal cortex may use basis functions” “Grouping components of three-dimensional moving objects in area MST of visual cortex”

Conflict of Interests The authors declare that there is no conflict of interests regarding the publication of this paper.

“A nonlinear information maximisation algorithm that performs blind separation”

Acknowledgments

“Plasticity-mediated competitive learning”

This work was supported by the Key Technologies Research on Data Mining from the Multiple Electric Vehicle Information Sources which is sponsored by Key Technologies R&D Program of Chinese 12th Five-Year Plan (2011–2015) under Grant no. 2013BAG06B01, and the Scientific Collaboration Network Analysis Based on Content and Linkage Data which is sponsored by ISTIC Preresearch Foundation under Grant no. YY201221, respectively.

“A mixture model system for medical and machine diagnosis” “A model of spatial representations in parietal cortex explains hemineglect” “A Dynamical model of context dependencies for the vestibulo-ocular reflex” “Independent component analysis of electroencephalographic data” “Tempering backpropagation networks: not all weights are created equal” “Classifying facial action” “Empirical entropy manipulation for real-world problems” “Using feedforward neural networks to monitor alertness from changes in EEG correlation and coherence” “Cholinergic modulation preserves spike timing under physiologically realistic fluctuating input” “Bayesian unsupervised learning of higher order structure” “Dynamic features for visual speechreading: a systematic comparison”

References [1] Z. Zhang, Q. Li, D. Zeng et al., “User community discovery from multi-relational networks,” Decision Support Systems, vol. 54, no. 2, pp. 870–879, 2013. [2] B. He, Y. Ding, J. Tang et al., “Mining diversity subgraph in multidisciplinary scientific collaboration networks: a meso perspective,” Journal of Informetrics, vol. 7, no. 1, pp. 117–128, 2013. [3] J. C. Brunson, S. Fassino, A. McInnes et al., “Evolutionary events in a mathematical sciences research collaboration network,” Scientometrics, 2013. [4] A. Rub´ı-Barceló, “Core/periphery scientific collaboration networks among very similar researchers,” Theory and Decision, vol. 72, no. 4, pp. 463–483, 2012. [5] L. Kronegger, F. Mali, A. Ferligoj, and P. Doreian, “Collaboration structures in Slovenian scientific communities,” Scientometrics, vol. 90, no. 2, pp. 631–647, 2012.

International Journal of Distributed Sensor Networks [6] A. Abbasi, L. Hossain, and C. Owen, “Exploring the relationship between research impact and collaborations for information science,” in Proceedings of the 45th Hawaii International Conference on System Sciences (HICSS ’12), pp. 774–780, Hawaii, Hawaii, USA, January 2012. [7] T. S. Evans, R. Lambiotte, and P. Panzarasa, “Community structure and patterns of scientific collaboration in business and management,” Scientometrics, vol. 89, no. 1, pp. 381–396, 2011. [8] A. Abbasi, L. Hossain, S. Uddin, and K. J. R. Rasmussen, “Evolutionary dynamics of scientific collaboration networks: multi-levels and cross-time analysis,” Scientometrics, vol. 89, no. 2, pp. 687–710, 2011. [9] A. Pepe and M. A. Rodriguez, “Collaboration in sensor network research: an in-depth longitudinal analysis of assortative mixing patterns,” Scientometrics, vol. 84, no. 3, pp. 687–701, 2010. [10] M. E. J. Newman, “Scientific collaboration networks. I. Network construction and fundamental results,” Physical Review E, vol. 64, no. 1, pp. 016131-1–016131-8, 2001. [11] A. Arenas, L. Danon, A. D´ıaz-Guilera, P. M. Gleiser, and R. Guimerà, “Community analysis in social networks,” The European Physical Journal B: Condensed Matter and Complex Systems, vol. 38, no. 2, pp. 373–380, 2004. [12] T. Krichel and N. Bakkalbasi, “A social network analysis of research collaboration in the economics community,” Journal of Information Management and Scientometrics, vol. 3, pp. 1–12, 2006. [13] G. Palla, I. Derényi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature, vol. 435, no. 7043, pp. 814–818, 2005. [14] X. Liu, J. Bollen, M. L. Nelson, and H. van de Sompel, “Coauthorship networks in the digital library research community,” Information Processing & Management, vol. 41, no. 6, pp. 1462– 1480, 2005. [15] S. Wu, J. Wang, X. Feng et al., “Scientific collaboration networks in China’s system engineering,” International JoUrnal of u- and e-Service, Science and Technology, vol. 6, no. 6, pp. 31–40, 2013. [16] A. Abbasi and L. Hossain, “Analyzing academic communities’ collaboration and performance,” in Proceedings of the International Conference on Information & Knowledge Engineering, pp. 75–82, 2011. [17] M. E. J. Newman, “Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality,” Physical Review E, vol. 64, no. 1, pp. 016132-1–016132-7, 2001. [18] R. Ichise, H. Takeda, and T. Muraki, “Research community mining with topic identification,” in Proceedings of the 10th International Conference on Information Visualization, pp. 276– 281, London, UK, July 2006. [19] M. V. Nguyen, M. Kirley, and R. Garc´ıa-Flores, “Community evolution in a scientific collaboration network,” in Proceedings of the IEEE World Congress on Computational Intelligence (WCCI ’12), pp. 1–8, Brisbane, Australia, 2012. [20] T. S. Evans, “Clique graphs and overlapping communities,” Journal of Statistical Mechanics: Theory and Experiment, no. 12, p. 12037, 2010. [21] L. Garc´ıa-Bañuelos, A. Portilla, A. Chávez-Aragón, O. F. ReyesGalaviz, and H. Ayanegui-Santiago, “Finding and analyzing social collaboration networks in the Mexican computer science community,” in 10th Mexican International Conference on Computer Science (ENC ’09), pp. 167–175, Mexico, Mexico, September 2009. [22] M. G. Everett, “Analyzing clique overlap,” Connections, vol. 21, no. 1, pp. 49–61, 1998.

13 [23] M. Plantié and M. Crampes, “Survey on social community detection,” Social Media Retrieval, pp. 65–85, 2013. [24] B. Hemant, Algorithms for discovering communities in complex networks [Ph.D. thesis], University of Central Florida, 2006. [25] M. A. Porter, J.-P. Onnela, and P. J. Mucha, “Communities in networks,” Notices of the American Mathematical Society, vol. 56, no. 9, pp. 1082–1097, 2009. [26] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 12, pp. 7821–7826, 2002. [27] M. E. J. Newman, “Fast algorithm for detecting community structure in networks,” Physical Review E, vol. 69, no. 6, p. 066133, 2004. [28] E. Gregori, L. Lenzini, and S. Mainardi, “Parallel k-clique community detection on large-scale networks,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 8, pp. 1651–1660, 2013. [29] B. Tóth, T. Vicsek, and G. Palla, “Overlapping modularity at the critical point of k-clique percolation,” Journal of Statistical Physics, vol. 151, no. 3-4, pp. 689–706, 2013. [30] X. Deng, B. Wang, B. Wu et al., “Modularity modeling and evaluation in community detecting of complex network based on information entropy,” Journal of Computer Research and Development, vol. 49, no. 4, pp. 725–734, 2012. [31] S. Qingwei, Q. Xiaodong, X. Shuo et al., “Author-topic evolution model and its application in analysis of research interests evolution,” Journal of the China Society for Scientific and Technical Information, vol. 32, no. 9, pp. 912–919, 2013. [32] S. Xu, Q. Shi, X. Qiao et al., “A dynamic users’ interest discovery model with distributed inference algorithm,” International Journal of Distributed Sensor Networks, 2014. [33] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999. [34] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003. [35] M. Rosen-Zvi, T. Griffiths, M. Steyvers et al., “The author-topic model for authors and documents,” in Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI ’04), pp. 487–494, Arlington, Va, USA, 2004. [36] X. Wang and A. McCallum, “Topics over time: a non-Markov continuous-time model of topical trends,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433, August 2006. [37] S. Xu, Q. Shi, X. Qiao et al., “Author-Topic over Time (AToT): a dynamic users’ interest model,” in Mobile, Ubiquitous, and Intelligent Computing, pp. 239–245, Springer, Berlin, Germany, 2014. [38] P. V. Nguyen, “CART: conference-author-relation topic model for relationship mining and role discovery in citation network,” University of Illinois at Urbana-Champaign, 2013.

14 [39] D. M. Blei, “Probabilistic topic models,” Communications of the ACM, vol. 55, no. 4, pp. 77–84, 2012. [40] M. Rosen-Zvi, T. Griffiths, M. Steyvers et al., The Author-Topic Model for Authors and Documents, AUAI Press, Arlington, Tex, USA, 2004. [41] M. Steyvers, P. Smyth, M. Rosen-Zvi et al., Probabilistic AuthorTopic Models for information Discovery, ACM Press, New York, NY, USA, 2004. [42] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using NetworkX,” in Proceedings of the 7th Python in Science Conference (SciPy ’08), pp. 11–15, Pasadena, Calif, USA.


International Journal of

Rotating Machinery

Engineering Journal of

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014


Distributed Sensor Networks

Journal of

Sensors Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014


Volume 2014


Volume 2014

Journal of

Control Science and Engineering

Advances in

Civil Engineering Hindawi Publishing Corporation http://www.hindawi.com


Volume 2014

Volume 2014

Submit your manuscripts at http://www.hindawi.com Journal of

Journal of

Electrical and Computer Engineering

Robotics Hindawi Publishing Corporation http://www.hindawi.com


Volume 2014

Volume 2014

VLSI Design Advances in OptoElectronics


Navigation and Observation Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014



Chemical Engineering Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Volume 2014

Active and Passive Electronic Components

Antennas and Propagation Hindawi Publishing Corporation http://www.hindawi.com

Aerospace Engineering


Volume 2014


Volume 2014

Volume 2014




Modelling & Simulation in Engineering

Volume 2014


Volume 2014

Shock and Vibration Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Advances in

Acoustics and Vibration Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014