A Performance Assessment on Various Data mining Tool Using Support Vector Machine

G. Karthikeyan; K. Saroja; S. Prasath

G. Karthikeyan Asst. Prof. of BCA, Nandha Arts & Science College, Erode, TN, India
K. Saroja Asst. Prof. of CS, Nandha Arts & Science College, Erode, TN, India
S. Prasath Assistant Professor in Department of Computer Science in Erode Arts & Science College, Erode, TN India

Keywords: SVM, WEKA, KDD, DM, KNIME, KNN.

Abstract

Data mining is essentially the discovery of valuable information and patterns from huge chunks of available data. Two indispensable techniques of data mining are clustering and classification, where the latter employs a set of pre-classified examples to develop a model that can classify the population of records at large, and the former divides the data into groups of similar objects. In this paper we have proposed a new method for data classification by integrating two data mining techniques, viz. clustering and classification. Then a comparative study has been carried out between the simple classification and new proposed integrated clustering-classification technique. Four popular data mining tools were used for both the techniques by using six different classifiers and one clustered for all sets. It was found that across all the tools used, the integrated clustering-classification technique was better than the simple classification technique. This result was consistent for all the six classifiers used. For both of the techniques, the best classifier was found to be SVM. From the four tools used, KNIME found to be the best in terms of flexibility of algorithm. All comparisons were drawn by comparing the percentage accuracy of each classifier used.

Downloads

Download data is not yet available.

References

David Heckerman. Bayesian Network for Data Mining. Data Mining and Knowledge Discovery, 1997:79-119..

David Hand, Heikki Mannila and Padhraic Smyth. Principles of Data Mining, the MIT Press, 2001:1-5...

A Short Introduction to Data Mining and Its Applications Zhang Haiyang

Ritu Chauhan, Harleen Kaur, M.Afshar Alam, Data Clustering Method for Discovering Clusters in Spatial Cancer Databases, International Journal of Computer Applications ,Volume 10– No.6, November 2010

J.R Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.

S.Kotsiantis, D.Kanellopoulos, P.Pintelas, "Data Preprocessing for Supervised Leaning", International Journal of Computer Science, 2006, Vol 1 N. 2, pp 111–117.

MacQueen.J.B., "Some Methods for classification and Analysis of Multivariate Observations",Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability.University of California Press. 1967, pp. 281–297.

Lloyd S.P."Least square quantization in PCM". IEEE Transactions on Information Theory 28,1982,pp.129-137.

Manish Verma, MaulySrivastava, NehaChack, Atul Kumar Diswar and Nidhi Gupta, ―A Comparative Study of Various Clustering Algorithms in Data Mining, International Journal of Engineering Research and Applications (IJERA) Vol. 2, Issue 3, May-Jun 2012, pp.1379-1384.