A Performance Assessment on Various Data mining Tool Using Support Vector Machine
Abstract
Data mining is essentially the discovery of valuable information and patterns from huge chunks of available data. Two indispensable techniques of data mining are clustering and classification, where the latter employs a set of pre-classified examples to develop a model that can classify the population of records at large, and the former divides the data into groups of similar objects. In this paper we have proposed a new method for data classification by integrating two data mining techniques, viz. clustering and classification. Then a comparative study has been carried out between the simple classification and new proposed integrated clustering-classification technique. Four popular data mining tools were used for both the techniques by using six different classifiers and one clustered for all sets. It was found that across all the tools used, the integrated clustering-classification technique was better than the simple classification technique. This result was consistent for all the six classifiers used. For both of the techniques, the best classifier was found to be SVM. From the four tools used, KNIME found to be the best in terms of flexibility of algorithm. All comparisons were drawn by comparing the percentage accuracy of each classifier used.Downloads
References
David Heckerman. Bayesian Network for Data Mining. Data Mining and Knowledge Discovery, 1997:79-119..
David Hand, Heikki Mannila and Padhraic Smyth. Principles of Data Mining, the MIT Press, 2001:1-5...
A Short Introduction to Data Mining and Its Applications Zhang Haiyang
Ritu Chauhan, Harleen Kaur, M.Afshar Alam, Data Clustering Method for Discovering Clusters in Spatial Cancer Databases, International Journal of Computer Applications ,Volume 10– No.6, November 2010
J.R Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
S.Kotsiantis, D.Kanellopoulos, P.Pintelas, "Data Preprocessing for Supervised Leaning", International Journal of Computer Science, 2006, Vol 1 N. 2, pp 111–117.
MacQueen.J.B., "Some Methods for classification and Analysis of Multivariate Observations",Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.University of California Press. 1967, pp. 281–297.
Lloyd S.P."Least square quantization in PCM". IEEE Transactions on Information Theory 28,1982,pp.129-137.
Manish Verma, MaulySrivastava, NehaChack, Atul Kumar Diswar and Nidhi Gupta, ―A Comparative Study of Various Clustering Algorithms in Data Mining, International Journal of Engineering Research and Applications (IJERA) Vol. 2, Issue 3, May-Jun 2012, pp.1379-1384.
Copyright (c) 2016 Journal of Information Sciences and Computing Technologies
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
TRANSFER OF COPYRIGHT
JISCT is pleased to undertake the publication of your contribution to Journal of Information Sciences and Computing Technologies
The copyright to this article is transferred to JISCT(including without limitation, the right to publish the work in whole or in part in any and all forms of media, now or hereafter known) effective if and when the article is accepted for publication thus granting JISCT all rights for the work so that both parties may be protected from the consequences of unauthorized use.