00 front.indd - Semantic Scholar

4 downloads 11460 Views 99KB Size Report
For more than 15 years of continually growing data mining technology, the attention is now shifting to the applications that make data mining an integral part of ...
i

IJDWM SPECIAL ISSUE: ADVANCES IN DATA MINING APPLICATIONS Xue Li, The University of Queensland, Australia Shichao Zhang, University of Technology, Australia Shuliang Wang, Wuhan University, China

This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.

For more than 15 years of continually growing data mining technology, the attention is now shifting to the applications that make data mining an integral part of the system. With the ever-growing power to generate, transmit, and collect huge amounts of data, we are facing an imminent problem of information overload. The overwhelming demand for information processing is not just about a better understanding of data but also about a better usage of data in a timely fashion in order to help people make informed, sensible, and better decisions. As a result, there is an urgent need for sophisticated techniques and tools that can handle new fields of data mining (e.g., spatial data mining, biomedical data mining, and mining on high-speed and time-variant data streams). This issue on advances in data mining applications has selected papers that are expanded from the papers published on the First International Conference on Advanced Data Mining and Applications (ADMA, 2005) held in July 2005 in Wuhan, China. The selected articles focus on advanced data mining applications (e.g., image reconstruction using ART approach [Algorithm Reconstruction Technique]) (Zhong Qu), financial credit assessment by applying TOPSIS (Technique

for Order Preference by Similarity to Ideal Solution) (Desheng “Dash” Wu and David L. Olson), QoS in network routing problems using rough sets (Yanbing Liu et al.), partially supervised classification based on WUS-SVM (Weighted Unlabeled Samples Support Vector Machine) for remote sensoring (Zhigang Liu et al.), and sensitivity analysis through MCV (Minimum Cluster Volume) algorithm for cluster-based input selection problem (Can Yang et al.). Data mining application is essentially a problem involving three aspects of knowledge: data, algorithm, and application domain. Data is the first-class citizen in data mining research. Understanding data—their structures, high dimensionality, and qualification and quantification issues—is always critical. The second important aspect of data mining application is about algorithms—their effectiveness, efficiency, scalability, and applicability. Among a variety of applicable algorithms, to select the right one for dealing with a specific problem is always a meta-reasoning question that requires data mining research community contributions. The third important aspect is on domain knowledge of applications. Without a good understanding of domain knowledge, data mining process is hardly

ii able to avoid GIGO (Garbage-In-Garbage-Out) effect. In this issue, Qu presents an efficient iterative image reconstruction algorithm in industrial computed tomography with the narrow fan-beam projection. Image reconstruction is one of the key technologies in industrial computed tomography. The use of an algebraic method is limited because of the low reconstruction speed. So, the ART (Algorithm Reconstruction Technique), a new iterative method, is introduced in order to accelerate the iteration process and to speed up the reconstruction process. Wu and Olson propose an approach; namely, TOPSIS classifier. It is applied on a credit-scoring problem. Data sets often contain many potential explanatory variables; some of them might be preferably minimized, while others may be preferably maximized. The ordered preference would be computed for the indication of an implied classification. The authors conducted the experiments. Results are favorable by a comparison with conventional data mining techniques of decision trees. The proposed models are validated using Monte Carlo simulation. Y. Liu et al. propose a streaming data mining approach based on rough set theory for QoS routing in computer networks. Network link information is obtained from subnetworks. Rough set theory is applied to mining the best route from enormous irregular link data and is used to classify network links with the link-status data. Z. Liu et al. propose a new classification technique: partially supervised classification (PSC) approach to identify land-cover class of interest from remotely sensed images. It

discusses a novel Support Vector Machine (SVM) algorithm for PSC. Initially, the training set includes both labeled samples that are the class of interest and unlabeled samples that are classes randomly selected from remotely sensed images. Then, all unlabeled samples are assumed to be training samples of other classes, and each of them is assigned a weight factor indicating the likelihood of the assumption. The algorithm is called Weighted Unlabeled Sample SVM (WUSSVM). The authors also conducted some sensible experiments. Yang et al. propose an effective approach to input selection for nonlinear regression modeling problems. This approach has wide applications on problems such as automobile MPG (miles per gallon) prediction, Box and Jenkins gas furnace process, and other dynamic process modeling problems. The proposed approach is a modelfree method that has advantage of no needs for a specific model to be built in advance for checking all possible input combinations. The method is based on sensitivity analysis of input data features through MCV algorithm. The effectiveness of the proposed method is evaluated via some benchmark tests. The collection of articles in this issue has shown a convergence of data mining with intelligent data processing where computational intelligence is a core of the technology. With a large volume of data on hand, data mining technology is heading its way to be transparent and integrated with system functions that provide data-heavy and data-sensitive services. Xue Li, Shichao Zhang, and Shuliang Wang

Xue Li is a senior lecturer in Information Technology and Electrical Engineering at the University of Queensland, Australia, and an adjunct professor in Electronic Science and Technology, China. He has a master’s in computer science from University of Queensland and a PhD in information systems from Queensland University of Technology. He is currently interested in researches of data mining on streaming data, text mining, and intelligent information systems. He has been teaching databases, programming, and data mining courses in the Australian universities of QUT, UNSW, and UQ for the last 10 years. He has published more than 50 research papers in journals, editorials of journals, book chapters, and international conferences. He is currently a holder of a large Australian ARC grant on streaming data mining.

iii Shichao Zhang is a senior research fellow in the Faculty of Information Technology at UTS, Australia, and a chair professor of automatic control at BUAA, China. He received his PhD in computer science from Deakin University, Australia. His research interests include data analysis and smart pattern discovery. He has published about 40 international journal papers, including six in IEEE/ACM Transactions, two in Information Systems, and six in IEEE magazines; and over 40 international conference papers, including two ICML papers. He has won four China NSF/863 grants, two Australian large ARC grants, and two Australian small ARC grants. He is a senior member of the IEEE, a member of the ACM, and serving as an associate editor for Knowledge and Information Systems and The IEEE Intelligent Informatics Bulletin. Shuliang Wang, PhD, is a professor in spatial data mining and software engineering, International School of Software, and the director of the Data Mining Laboratory, Wuhan University, China. Dr. Wang has been researching and working in the area of spatial data mining and software engineering since 1997 in Hong Kong, New Zealand, Australia, and Mainland China. He won his bachelor’s at Wuhan University in 1997 and PhD at Wuhan University and the Hong Kong Polytechnic University in 2002. As a postdoctoral research fellow, he worked in the Hong Kong Polytechnic University and Tsinghua University. In 2005, his doctoral thesis was awarded one of “100 best national theses in China”. Dr. Wang has published over 50 research articles and more than five monographs. Two monographs were respectively published by Springer and the International Society for Optical Engineering (SPIE). Since he has joined Wuhan University in 2004, Dr. Wang has won over 2 million RMB for three projects as principal investigator. His current research interests include spatial data mining, software engineering, GIS, remote sensing, spatial data uncertainties, complex network, and soft computing.