A SWARM-BASED ROUGH SET APPROACH FOR FMRI DATA ...

International Journal of Innovative Computing, Information and Control Volume 7, Number 6, June 2011

c ICIC International ⃝2011 ISSN 1349-4198 pp. 3121–3132

A SWARM-BASED ROUGH SET APPROACH FOR FMRI DATA ANALYSIS ´ n Mcloone4 Hongbo Liu1,2,3,4,∗ , Ajith Abraham3 , Weishi Zhang1 and Sea 1

School of Information Dalian Maritime University No. 1, Linghai Road, Dalian 116026, P. R. China [email protected] ∗ Corresponding author: [email protected] 2

School of Computer Dalian University of Technology No. 2, Linggong Road, Ganjingzi District, Dalian 116023, P. R. China [email protected] 3

Machine Intelligence Research Labs – MIR Labs Auburn, Washington 98071, USA [email protected] 4

Department of Electronic Engineering National University of Ireland Maynooth Maynooth, Co. Kildare, Ireland [email protected]

Received April 2010; revised August 2010 Abstract. The functional Magnetic Resonance Imaging (fMRI) is one of the most important tools for exploring the operation of the brain as it allows the spatially localized characteristics of brain activity to be observed. However, fMRI studies generate huge volumes of data and the signals of interest have low signal to noise ratio making its analysis a very challenging problem. There is a growing need for new methods that can efficiently and objectively extract the useful information from fMRI data and translate it into intelligible knowledge. In this paper, we introduce a swarm-based rough set approach to fMRI data analysis. Our approach is based on exploiting the power of particle swarm optimization to discover the feature combinations in an efficient manner by observing the change in positive region as the particles proceed through the search space. The approach supports multi-knowledge extraction. We evaluate the performance of the algorithm using benchmark and fMRI datasets. The results demonstrate its potential value for cognition research. Keywords: Particle swarm, Swarm intelligence, Multi-knowledge, fMRI

1. Introduction. The field of neuroinformatics is concerned with the collection, organization and analysis of neuroscience data and the development of computational models and analytical tools for the exploration of this data. Standing at the intersection between neuroscience and information science, neuroinformatics plays a vital role in the integration and analysis of increasingly fine grain experimental data and in improving existing theories about nervous system and brain function [1]. Functional Magnetic Resonance Imaging (fMRI) is one of the most important tools for the generation of Neuroinformatics data. It provides a high resolution volumetric mapping of the haemodynamic response of the brain which can be correlated with neural activity, thereby allowing the spatially localized characteristics of brain activity to be observed. 3121

3122

H. LIU, A. ABRAHAM, W. ZHANG AND S. MCLOONE

However, working with and interpreting fMRI data is challenging due to the sheer volume of data involved. A single scan typically consists of more than half a million voxels, and with scan rates of between 0.2 and 1 Hz common place, datasets can run into terabytes. Coupled with this, the low signal to noise ratio of the activity signatures of interest and inter-subject variability makes the task of establishing the relationships between cognition states and cognition tasks/stimuli, a particularly demanding neuroinformatics problem. Recent research has mainly focused on employing statistical methods [2, 3, 4, 5, 6] to identify areas of the brain that show significant activity in response to a stimulus, so called regions of interest (ROI). These ROIs are then analyzed in detail with neuroscientists and psychologists providing an interpretation of the observed patterns, a process which strongly depends on their accumulative experience and subjective tendency. Rough set theory provides a novel approach to extracting a reduced set of features from fMRI data and facilitating knowledge extraction. The rough set approach was developed by Pawlak in the early 1980’s as a framework for discovering relationships in imprecise data [7]. Rough sets refer to a mathematical concept in set theory. Rough sets are used to represent uncertainties in data. The primary goal of the rough set approach is to derive rules from data represented in an information system. The derivation of rules serves two main purposes: Firstly, the rules may be used in the classification of database objects, that is, predict the outcomes of unseen objects. Secondly, the rules may be used to develop a model of the domain under study, that is present knowledge in a format that can be understood by a human [8]. The rough set approach consists of several steps leading towards the final goal of generating rules from information/decision systems. The main steps of the rough set approach are: (1) mapping of the information from the original database into a decision system format; (2) completion of data; (3) discretization of data; (4) computation of reducts from data; (5) derivation of rules from reducts; (6) filtering of rules. One of the most important tasks is the data reduction process. The key problem of knowledge discovery using a rough set approach is data reduction. Usually, real world objects are represented as tuples in decision tables and these tables store a huge quantity of data, which is hard to manage from a computational point of view. Finding reducts in a large information system is still an NP-hard problem [9]. The high complexity of this problem has motivated investigators to apply various approximation techniques to find near-optimal solutions. Many approaches have been proposed for finding reducts, e.g., discernibility matrices, dynamic reducts and others [10, 11]. Heuristic algorithms are a better choice. Hu et al. [12] proposed a heuristic algorithm using a discernibility matrix. The approach provided a weighting mechanism to rank attributes. Zhong and Dong [13] presented a wrapper approach using rough sets theory with greedy heuristics for feature subset selection. The aim of feature subset selection is to find a minimum set of relevant attributes that describe the dataset as well as the original full set of attributes. Finding reducts is, therefore, similar to feature selection. Zhong’s algorithm employed the number of consistent instances as a heuristic. Banerjee et al. [14] presented various attempts of using Genetic Algorithms in order to obtain reducts. Babaoglu et al. [15] studied the efficiency of binary particle swarm optimization and genetic algorithm techniques as feature selection models on determination of coronary artery disease existence based upon exercise stress testing data. Classification results of a feature selection technique using binary particle swarm optimization and genetic algorithms are compared with each other and also with the results obtained when the full set of features are used as an input to a support vector machine model. The results show that feature selection using binary particle swarm optimization is more successful than feature selection using genetic algorithms for determining coronary artery disease.

A SWARM-BASED ROUGH SET APPROACH FOR FMRI DATA ANALYSIS

3123

The particle swarm algorithm is inspired by social behavior patterns of organisms that live and interact within large groups. In particular, it incorporates swarming behaviors observed in flocks of birds, schools of fish or swarms of bees and even human social behavior, from which the swarm intelligence paradigm has emerged [16]. The swarm intelligent model helps to find optimal regions of complex search spaces through interaction of individuals in a population of particles [17, 18]. As an algorithm, its main strength is its fast convergence, which compares favorably with many other global optimization algorithms [19, 20, 21, 22]. It has exhibited good performance across a wide range of applications [23, 24, 25]. The particle swarm algorithm is particularly attractive for feature selection as there seems to be no heuristic that can guide search to the optimal minimal feature subset. Additionally, it can be the case that particles discover the best feature combinations as they proceed throughout the search space. This paper introduces particle swarm optimization to the problem of rough set reduction processing, as a means of extracting multi-knowledge from fMRI datasets. The remainder of the paper is organized as follows. The steps involved in fMRI data pre-processing are described briefly in Section 2. The rough set reduction algorithm is introduced in Section 3. In Section 4, we illustrate some results of extracting knowledge from fMRI Data using our approach. Finally, conclusions are presented in Section 5. 2. Data Pre-processing. A typical normalized fMRI image contains more than 500,000 voxels, which is far greater than the dimension of any feature vector representative of brain activity. Consequently, we transform datasets from the MNI (Montreal Neurological Institute) template to the Talairach coordinate system [26] and then exploit the region information in Talairach as features to reduce the dimensionality of the images. We used the SP M 99 software package1 and in-house programs for image processing, including corrections for head motion, normalization and global fMRI signal shift [27]. A simplified work flow is illustrated in Figure 1. The feature selection and extraction algorithm for fMRI data is described in Algorithm 1. Figure 2 is a snapshot of one interface of our own software, which is used in the determination of appropriate regions of interest (ROI) for feature selection and extraction. For a given fMRI image file, in this case spmT 002.img, the interface displays the cognitive activation information within a specified spherical volume so that the ROI voxels can be highlighted. The center and radius of the sphere are specified using the interface and are in this instance (–6, 18, –6) and 5mm, respectively. The first column of data in the display, V oxel T , indicates the degree of activation with a significance level of 0.001, while the other columns specify the voxel location within the brain. 3. Rough Set Reduction Algorithm. 3.1. Reduction criteria. The basic concepts of rough set theory and its philosophy are presented and illustrated with examples in [28, 29, 30, 31]. Here, we explain only the terminology relevant to our reduction method. In rough set theory, an information system is denoted in 4-tuple by S = (U, A, V, f ), where U is the universe of discourse, a non-empty finite set of N objects {x1 , x2 , · · · , xN }. A is a non-empty finite set of attributes such that a : U → Va for every a ∈ A (Va is the value set of the attribute a). ∪ V = Va a∈A 1

http://www.fil.ion.ucl.ac.uk/spm/

3124


Analyzing the full set of fMRI images, find out the most active voxels in several regions of the brain under the t-test of basic models in SP M 99 and save their coordinates. for each fMRI image do (a) scan the image and search for the ROI voxels corresponding to the saved coordinates. (b) for each ROI voxel compute the average of all voxels within its neighborhood (as defined by a sphere centered on the ROI voxel and with a radius set to a predefined constant). These resulting average values of ROI voxels constitute the feature vector for the image. end Algorithm 1: Feature selection & extraction algorithm for fMRI data

Figure 1. Pre-precessing workflow for fMRI data

Figure 2. Location for feature selection & extraction


3125

f : U × A → V is the total decision function (also called the information function) such that f (x, a) ∈ Va for every a ∈ A, x ∈ U . The information system can also be defined as a decision table by S = (U, C, D, V, f ). For the decision table, C and D are two subsets of attributes. A = {C ∪ D}, C ∩ D = ∅, where C is the set of input features and D is the set of class indices. They are also called condition and decision attributes, respectively. Indiscernibility Relation: Let a ∈ C ∪ D, P ⊆ C ∪ D. A binary relation IN D(P ), called an equivalence (indiscernibility) relation, is defined as follows: IN D(P ) = {(x, y) ∈ U × U

| ∀a ∈ P, f (x, a) = f (y, a)}

(1)

The equivalence relation IN D(P ) partitions the set U into disjoint subsets. Let U/IN D (P ) denote the family of all equivalence classes of the relation IN D(P ). For simplicity of notation, U/P will be written instead of U/IN D(P ). Such a partition of the universe is denoted by U/P = {P1 , P2 , · · · , Pi , · · · }, where Pi is an equivalence class of P , which is denoted [xi ]P . Equivalence classes U/C and U/D will be called condition and decision classes, respectively. Lower Approximation: Given a decision table T = (U, C, D, V, f ). Let R ⊆ C ∪D, X ⊆ U and U/R = {R1 , R2 , · · · , Ri , · · · }. The R-lower approximation set of X is the set of all elements of U which can be with certainty classified as elements of X, assuming knowledge R. It can be presented formally as ∪ − AP RR (X) = {Ri | Ri ∈ U/R, Ri ⊆ X} (2) Positive Region: Given a decision table T = (U, C, D, V, f ). Let B ⊆ C, U/D = {D1 , D2 , · · · , Di , · · · } and U/B = {B1 , B2 , · · · , Bi , · · · }. The B-positive region of D is the set of all objects from the universe U which can be classified with certainty to classes of U/D employing features from B, i.e., ∪ − AP RB (Di ) (3) P OSB (D) = Di ∈U/D

Reduct: Given a decision table T = (U, C, D, V, f ). The attribute a ∈ B ⊆ C is D-dispensable in B, if P OSB (D) = P OS(B−{a}) (D); otherwise the attribute a is Dindispensable in B. If all attributes a ∈ B are D-indispensable in B, then B will be called D-independent. A subset of attributes B ⊆ C is a D-reduct of C, if P OSB (D) = P OSC (D) and B is D-independent. It means that a reduct is the minimal subset of attributes that enables the same classification of elements of the universe as the whole set of attributes. In other words, attributes that do not belong to a reduct are superfluous with regard to classification of elements of the universe. Usually, there are many reducts in an instance information system. Let 2|A| represent all possible attribute subsets {{a1 }, · · · , {a|A| }, {a1 , a2 }, · · · , {a1 , · · · , a|A| }}. Let RED represent the set of reducts, i.e., RED = {B

| P OSB (D) = P OSC (D), P OS(B−{a}) (D) < P OSB (D)}

(4)

Multi-knowledge: Let RED represent the set of reducts and let φ be a mapping from the condition space to the decision space. Then, multi-knowledge can be defined as follows: Ψ = {φB

| B ∈ RED}

(5)

The rough set approach provides a tool for obtaining the reducts from a decision table. The reducts can be used to represent knowledge in real-world applications. Usually, each reduct can be applied to generate a single body of knowledge. In fact, many decision tables in the real world have multiple reducts, allowing multi-knowledge to be extracted based on multiple reducts. The results of extracting multi-knowledge can be expressed as rules.

3126


Reduced Positive Universe and Reduced Positive Region: Given a decision table T = ′ (U, C, D, V, f ). Let U/C = {[u′1 ]C , [u′2 ]C , · · · , [u′m ]C }. Reduced universe U can be written as: U ′ = {u′1 , u′2 , · · · , u′m } (6) and

[ ] P OSC (D) = u′i1 C ∪ [u′i2 ]C ∪ · · · ∪ [u′it ]C ,

(7)

where ∀u′is ∈ U ′ and |[u′is ]C /D| = 1 (s = 1, 2, · · · , t). The reduced positive universe can be written as: ′ (8) = {u′i1 , u′i2 , · · · , u′it } Upos and ∀B ⊆ C, the reduced positive region P OSB′ (D) =

∪

X

(9)

′ ∧|X/D|=1 X∈U ′ /B∧X⊆Upos

where |X/D| represents the cardinality of the set X/D. ∀B ⊆ C, P OSB (D) = P OSC (D) ′ [30]. This provides a more efficient method for observing the change in if P OSB′ = Upos positive region when we search for the reducts. 3.2. Particle swarm approach for reduction. Given a decision table, T = (U, C, D, V, f ), the set of condition attributes, C, consist of m attributes. We set up a search space of m dimensions for the reduction problem. Accordingly each particle’s position is represented as a binary bit string of length m. Each dimension of the particle’s position maps one condition attribute. The domain for each dimension is limited to 0 or 1. The value ‘1’ means the corresponding attribute is selected while ‘0’ means it is not selected. Each position can be “decoded” to a potential reduction solution, a subset of C. The particle’s position is a series of priority levels of the attributes. The sequence of an attribute will not be changed during an iteration. During the search procedure, the fitness of each individual is evaluated. According to the definition of rough set reducts, the reduction solution must ensure the decision ability is the same as the primary decision table and the number of attributes in the feasible solution is kept as low as possible. In our algorithm, we first evaluate whether the potential reduction solution satisfies P OSE′ = ′ Upos or not (E is the subset of attributes represented by the potential reduction solution). If it is a feasible solution, we calculate the number of ‘1”s in it. The solution with the lowest number of ‘1”s is then selected. For the particle swarm, the fewer the number of ′ ‘1”s in a particle string, the better the fitness of the individual. P OSE′ = Upos is used as the criterion for establishing the solution validity. The particle swarm model consists of a swarm of particles, which are initialized with a population of random candidate solutions. They move iteratively through the d-dimension problem space searching for better solutions, where the fitness f can be measured by calculating the number of condition attributes in the potential reduction solution. Each particle has a position represented by a position-vector p⃗i (i is the index of the particle), and a velocity represented by a velocity-vector ⃗vi . Each particle remembers its own best # position so far in a vector p⃗# i , and its j-th dimensional value is pij . The best positionvector among the swarm so far is then stored in a vector p⃗∗ , and its j-th dimensional value is p∗j . The particle moves in a state space restricted to zero and one on each dimension. At each time step, each particle updates its velocity and moves to a new position according to Equations (10) and (11): ( ) vij (t) = wvij (t − 1) + c1 r1 p# (t − 1) − p (t − 1) + c2 r2 (p∗j (t − 1) − pij (t − 1)). (10) ij ij


{ 1 if ρ < sig(vij (t)); pij (t) = 0 otherwise,

3127

(11)

where c1 is a positive constant, referred to as the coefficient of the self-recognition component, c2 is a positive constant, called the coefficient of the social component and r1 and r2 are random numbers in the interval [0, 1]. The variable w is an inertia factor, whose value is typically setup to vary linearly from 1 to near 0 during the iterated processing. ρ is random number in the closed interval [0, 1]. The pseudo-code for the particle swarm search method is illustrated in Algorithm 2. ′

′

Calculate U , Upos using Equations (6) and (8); Initialize the size of the particle swarm n, and other parameters; Initialize the positions and the velocities for all the particles randomly; while the end criterion is not met do t = t + 1; Calculate the fitness value of each particle; ′ if P OSE′ ̸= Upos then the fitness is set as the total number of the condition attributes; else the fitness is the number of ‘1’ in the position. end ( n ∗ p⃗ = arg mini=1 f (⃗p∗ (t − 1)) , f (⃗p1 (t)), ) f (⃗p2 (t)), · · · , f (⃗pi (t)), · · · , f (⃗pn (t)) ; for i = 1 to n do ( ( ) ) # n p⃗# (t) = arg min f p ⃗ (t − 1) , f (⃗ p (t)) ; i i=1 i i for j = 1 to d do Update the j-th dimension value of p⃗i and ⃗vi according to Equations (10) and(11). end end end Algorithm 2: A rough set reduct algorithm based on particle swarm optimisation

4. Experimental Results and Discussion. 4.1. Experimental setup. In our experiments, we used Genetic Algorithm (GA) optimisation to benchmark the performance of PSO. Our algorithms were implemented in the C++ language and their parameters settings chosen in accordance with the recommendations of Abraham [32], Clerc [17] and Liu [20]. The computation environment was an Intelr CoreT M Duo CPU T2250 @1.73 GHz processor with 1G memory. In the GA, the probability of crossover was set to 0.8 and the probability of mutation was set to 0.08. In PSO, self coefficient, c1 and social coefficient, c2 , were both set as 1.49, and the inertia weight, w, was decreased linearly from 0.9 to 0.1. The size of the GA and PSO populations were both set to (even)(10 + 2 ∗ sqrt(D)), where D is the dimension of the position, i.e., the number of condition attributes. In each trial, the maximum number of iterations is (int)(0.1 ∗ recnum + 10 ∗ (nf ields − 1)), where recnum is the number of records/rows and nf ields is the number of condition attributes. Each experiment (for each algorithm) was repeated 3 times with different random seeds.

3128


Table 1. Dataset information Dataset lung-cancer lymphography

Samples 27 148

No. of Attributes 57 19

No. of Classes 3 4

24 GA PSO

22 20

R

18 16 14 12 10 8

0

100

200

300 Iteration

400

500

600

Figure 3. Performance of rough set reduction for the lung-cancer dataset 4.2. Testing on benchmarks. To illustrate the effectiveness and performance of the proposed algorithm, we illustrate the rough set reduct process and results on two benchmark problems, the well known lung-cancer and lymphography datasets from the UC Irvine machine learning repository2 . The lung-cancer dataset [33, 34] consists of 56 measurements (integer value attributes) and a corresponding classification of 27 lung biopsies into one of 3 types of pathological lung cancers. The lymphography data set [35, 36] consists of 18 categorical attributes and a corresponding diagnosis (normal find, metastases, malign lymph, fibrosis) for 148 lymph node biopsies. The information for the data sets is shown in Table 1. The objective in each case is to find the smallest set of attributes that can be used to correctly classify the biopsies. Figures 3 and 4 illustrate the performance of the algorithms for the lung-cancer and lymphography datasets, respectively. In the figures, “R” is the cardinality of the reduced set of attributes identified. For the lung-cancer dataset, the results (the identified reducts) for 3 repetitions of the GA and PSO algorithms are presented in Table 2. The corresponding results for the lymphography data set are given in Table 3. PSO usually achieves better results than GA optimization, although PSO is worse over the first 100 iterations, as illustrated in Figures 3 and 4. Even, in the cases where GA and PSO produce the same results, PSO converges more quickly than GA. In addition, multiple PSO runs usually produce multiple candidate reducts, allowing for the possibility of multi-knowledge extraction. 4.3. Extracting knowledge from fMRI data. We analyze the fMRI data from three cognition experiments involving: (a) tongue movement; (b) word association; (c) seeing and silently reading character data. There were 15 healthy young subjects (7 men and 8 women, mean age 22 years) in the tongue movement experiment, 10 healthy young subjects (5 men and 5 women, mean age 20.9 years) in the word association experiment and 12 healthy young subjects (5 men and 7 women, mean age 24.9 years) in the seeing 2

http://archive.ics.uci.edu/ml/


3129

10 GA PSO

9.5 9

R

8.5 8 7.5 7 6.5

0

50

100 Iteration

150

200

Figure 4. Performance of rough set reduction for the lymphography dataset Table 2. Reducts from the lung-cancer dataset Algorithm GA PSO

R 10 8 9 10

Reduct(s) 1, 3, 9, 12, 33, 41, 44, 47, 54, 56 11, 14, 24, 30, 42, 44, 45, 50 3, 8, 9, 12, 15, 35, 47, 54, 55 2, 3, 12, 19, 25, 27, 30, 32, 40, 56

Table 3. Reducts from the lymphography dataset Algorithm GA PSO

R 7 6 7 7

Reduct(s) 2, 6, 10, 13, 14, 17, 18 2, 13, 14, 15, 16, 18 1, 2, 13, 14, 15, 17, 18 2, 10, 12, 13, 14, 15, 18

and silently reading character data experiment. All subjects were recruited from the population of graduates and undergraduates at the Dalian Maritime University and Dalian University of Technology in Dalian, China. Informed consent was obtained before participation. The experiments involved a set of 9 tasks, as summarized in Table 4. Subjects were scanned on a 1.5T Siemens Magnetron Vision Scanner, employing a block design [37], while alternating between each of the tasks 1 to 8 in Table 4 and the control task. For the cognitive tasks, there are a total of 2580 fMRI images, yielding a set of 2580 fMRI records. A control experiment was also conducted in which the control task was alternated with itself to obtain a control task record. Using the approach described in Section 2 thirteen ROIs were identified for this set of tasks. For each ROI the activity level was labeled as either 0, 1, 2 or 3, corresponding to no, low, medium or high activity, respectively. The dataset includes 1 record for the control task and 2580 records for the other 8 cognitive tasks. Each record has an associated task identifier (0–9) stored in a decision attribute. In other words, the information system consists of 2581 rows and 14 columns (13 ROI attributes and one decision attribute) of data. The objective is to employ our PSO based rough set approach to identify multiple reducts for this system that can then form the basis for multi-knowledge extraction.

3130


Table 4. The task sequence in the fMRI experiments No. 0 1 2 3 4 5 6 7 8

Task Control task Tongue movement Associating verb from single noun Associating verb from single non-noun Making verb before single word Looking at number Silent reading Number Looking at Chinese word Silent reading Chinese word

Number of reducts

15

10

5

0

1

2

3

4 5 Length of reduct

6

7

8

Figure 5. Summary of the reducts obtained for the fMRI dataset In our investigation the reduct estimation process was repeated 3 times with different random seeds. In each case, the PSO swarm size was set to 18 while the other parameter settings were as specified in Subsection 4.1. This provides 18 × 3 opportunities to extract reducts, although the length of these reduction results are not always optimal, in the sense of having minimum length. To verify that the algorithm was producing valid reducts, following convergence the solution represented by each particle was checked to see if it ′ satisfied the criterion P OSE′ = Upos . The results obtained are summarized in Figure 5. All 54 reducts identified by the algorithm were valid, and varied in cardinality from two to seven. Once reducts have been identified they can be used to define classification rules. Typical examples are as follows: Rule 1: if M1 = 2, SMA = 2, Broca = 2 then Task = 1; Rule 2: if BAs{7,19,20,40,44,45} = 3, BSC = 2 then Task = 2; Rule 3: if BAs{10,11,13,44,45} = 3, BSC = 1 then Task = 3; Rule 4: if BAs{7,19,40} = 3, BSC = 3 then Task = 4; Rule 5: if SMA = 2, Broca = 3 then Task = 6; Rule 6: if SMA = 2, Broca = 2, Wernike = 3 then Task = 8. Here, M1, SMA, Broca, BSC and BAs {7, 10, 11, 13, 19, 20, 40, 44 and 45} are the anatomical labels [38] for the parts of the brain corresponding to the ROIs identified in the fMRI pre-processing step. 5. Conclusions. In this paper, we investigated a rough set approach to extracting knowledge from fMRI Data. The data preprocessing work flow and methods for the rough set


3131

approach were discussed initially. Then, a rough set reduction approach that exploits the particle swarm optimization algorithm was presented as a means of determining the best feature combinations in an efficient manner. The approach involves observing the change in positive region as the PSO particles proceed through the search space. The performance of PSO and GA based reduct estimation has been compared on two benchmark datasets. The results indicate that PSO usually requires less time and obtains better results than the GA, although its stability needs to be improved in further research. Results have also been presented for fMRI data analysis. Although the validity of the rules derived from the reducts cannot be established without further input from neuroscientists, the simplicity of the rules obtained suggest that the proposed approach has a lot of potential for solving the reduction problem, and therefore of significant benefit for cognition research. The proposed PSO based algorithm can identify multiple reducts and hence is capable of multi-knowledge extraction. However, in its current form it is not suitable for large scale datasets (greater than 3000 records), since it needs to extract the multiple reducts through a distributed multi-run search process. In order to work with much larger datasets, the search method needs to be improved significantly. Some improved PSO algorithms have been proposed in [18, 22, 39]. Exploring these alternate PSO implementations for efficient extraction of multi-knowledge from larger datasets is the subject of future research. Acknowledgment. The first author would like to thank Prof. Xiukun Wang and Drs. Shichang Sun, Benxian Yue, Mingyan Zhao and Hong Ye for their scientific collaboration in this research work. This work is supported by the National Natural Science Foundation of China (Grant Nos. 60873054, 61073056) and the Fundamental Research Funds for the Central Universities (Grant No. 2009QN043). REFERENCES [1] M. A. Arbib and J. S. Grethe, Computing the Brain: A Guide to Neuroinformatics, Academic Press, Boston, 2001. [2] A. Carpentier, K. R. Pugh, M. Westerveld, C. Studholme, O. Skrinjar, J. L. Thompson, D. D. Spencer and R. T. Constable, Functional MRI of language processing: Dependence on input modality and temporal lobe epilepsy, Epilepsia, vol.42, no.10, pp.1241-1254, 2001. [3] A. Rodriguez-Fornells, M. Rotte, H. J. Heinze, T. Nösselt and T. F. M¨ unte, Brain potential and functional MRI evidence for how to handle two languages with one brain, Nature, vol.415, no.6875, pp.1026-1029, 2002. [4] E. Fedorenko and N. Kanwisher, Neuroimaging of language: Why hasn’t a clearer picture emerged? Language and Linguistics Compass, vol.3, pp.839-865, 2009. [5] E. Fedorenko, P.-J. Hsieh, A. Castanon, S. Whitfield-Gabrieli and N. Kanwisher, A new method for fMRI investigations of language: Defining ROIs functionally in individual subjects, Journal of Neurophysiology, vol.104, pp.1177-1194, 2010. [6] J. J. Pillaia, Insights into adult postlesional language cortical plasticity provided by cerebral blood oxygen level – Dependent functional MR imaging, American Journal of Neuroradiology, vol.31, pp.990-996, 2010. [7] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences, vol.11, pp.341356, 1982. [8] G. Røed, Knowledge Extraction from Process Data: A Rough Set Approach to Data Mining on Time Series, Master Thesis, Norwegian University of Science and Technology, 1999. [9] M. Boussouf, Hybrid approach to feature selection, Lecture Notes in Artificial Intelligence, vol.1510, pp.231-238, 1998. [10] A. Skowron and C. Rauszer, The discernibility matrices and functions in information systems, Handbook of Applications and Advances of the Rough Set Theory, pp.331-362, 1992. [11] J. Zhang, J. Wang, D. Li, H. He and J. Sun, A new heuristic reduct algorithm base on rough sets theory, Lecture Notes in Artificial Intelligence, vol.2762, pp.247-253, 2003. [12] K. Hu, L. Diao and L. Shi, A heuristic optimal reduct algorithm, Lecture Notes in Computer Science, vol.1983, pp.139-144, 2000.

3132


[13] N. Zhong and J. Dong, Using rough sets with heuristics for feature selection, Journal of Intelligent Information Systems, vol.16, no.3, pp.199-214, 2001. [14] M. Banerjee, S. Mitra and A. Anand, Feature selection using rough sets, Studies in Computational Intelligence, vol.16, pp.3-20, 2006. ¨ Erkan, A comparison of feature selection models utilizing binary [15] I. Babaoglu, O. Findik and U. particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine, Expert Systems with Applications, vol.37, no.4, pp.3177-3183, 2010. [16] J. Kennedy and R. Eberhart, Swarm Intelligence, Morgan Kaufmann Publishers, San Francisco, CA, 2001. [17] M. Clerc, Particle Swarm Optimization, ISTE Publishing Company, London, 2006. [18] X. Cai and Z. Cui, Hungry particle swarm optimization, ICIC Express Letters, vol.4, no.3(B), pp.1071-1076, 2010. [19] K. E. Parsopoulos and M. N. Vrahatis, Recent approaches to global optimization problems through particle swarm optimization, Natural Computing, vol.1, pp.235-306, 2002. [20] H. Liu, A. Abraham and M. Clerc, Chaotic dynamic characteristics in swarm intelligence, Applied Soft Computing, vol.7, no.3, pp.1019-1026, 2007. [21] Z. Cui, X. Cai and J. Zeng, Chaotic performance-dependent particle swarm optimization, International Journal of Innovative Computing, Information and Control, vol.5, no.4, pp.951-960, 2009. [22] C. Wang, P. Sui and W. Liu, Improved particle swarm optimization algorithm based on double mutation, ICIC Express Letters, vol.3, no.4(B), pp.1417-1422, 2009. [23] K. E. Parsopoulos and M. N. Vrahatis, On the computation of all global minimizers through particle swarm optimization, IEEE Trans. on Evolutionary Computation, vol.8, no.3, pp.211-224, 2004. [24] J. F. Schute and A. A. Groenwold, A study of global optimization using particle swarms, Journal of Global Optimization, vol.31, pp.93-108, 2005. [25] C. Lai, C. Wu and M. Tsai, Feature selection using particle swarm optimization with application in spam filtering, International Journal of Innovative Computing, Information and Control, vol.5, no.2, pp.423-432, 2009. [26] C. M. Lacadie, R. K. Fulbright, N. Rajeevan, R. T. Constable and X. Papademetris, More accurate Talairach coordinates for neuroimaging using non-linear registration, Neuroimage, vol.42, no.2, pp.717-725, 2008. [27] Y. Ji, H. Liu, X. Wang and Y. Tang, Cognitive states classification from fMRI data using support sector machines, Proc. of the 3rd International Conference on Machine Learning and Cybernetics, pp.2920-2923, 2004. [28] Z. Pawlak, Rough sets and intelligent data analysis, Information Sciences, vol.147, pp.1-12, 2002. [29] G. Wang, Rough reduction in algebra view and information view, International Journal of Intelligent Systems, vol.18, pp.679-688, 2003. [30] Z. Xu, Z. Liu, B. Yang and W. Song, A quick attribute reduction algorithm with complexity of max(O(|C||U |), O(|C|2 |U/C|)), Chinese Journal of Computers, vol.29, no.3, pp.391-399, 2006. [31] Q. X. Wu and D. A. Bell, Multi-knowledge extraction and application, Lecture Notes in Artificial Intelligence, vol.2639, pp.274-279, 2003. [32] A. Abraham, Evolutionary computation, in Handbook For Measurement Systems Design, P. Sydenham and R. Thorn (eds.), London, John Wiley and Sons Ltd., 2005. [33] A. Abraham, The power of decision tables, Lecture Notes in Computer Science, vol.912, pp.174-189, 1995. unes, Principles component analysis, fuzzy weighting pre-processing and artificial [34] L. Polat and S. G¨ immune recognition system based diagnostic system for diagnosis of lung cancer, Expert Systems with Applications, vol.34, no.1, pp.214-221, 2008. [35] A. Hoang, Supervised Classifier Performance on the UCI Database, University of Adelaide, 1997. [36] U. G. Mangai, S. Samanta, S. Das and P. R. Chowdhury, A survey of decision fusion and feature fusion strategies for pattern classification, IETE Technical Review, vol.27, no.4, pp.293-307, 2010. [37] P. Rozencwajg and D. Corroyer, Strategy development in a block design task, Intelligence, vol.30, no.1, pp.1-25, 2002. [38] S. C. Blank, S. K. Scott, K. Murphy, E. Warburton and R. J. S. Wise, Speech production: Wernicke, broca and beyond, Brain, vol.125, no.8, pp.1829-1838, 2002. [39] S. S. Fan and J. Chang, Dynamic multi-swarm particle swarm optimizer using parallel PC cluster systems for global optimization of large-scale multimodal functions, Engineering Optimization, vol.42, no.5, pp.431-451, 2010.