Kybernetika 50 no. 5, 661-678, 2014

Consensus clustering with differential evolution

Miroslav SaboDOI: 10.14736/kyb-2014-5-0661

Abstract:

Consensus clustering algorithms are used to improve properties of traditional clustering methods, especially their accuracy and robustness. In this article, we introduce our approach that is based on a refinement of the set of initial partitions and uses differential evolution algorithm in order to find the most valid solution. Properties of the algorithm are demonstrated on four benchmark datasets.

Keywords:

consensus clustering, differential evolution, ensemble, data

Classification:

62H30, 92G30

References:

  1. R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. 2001 ACM SIGMOD International Conference on Management of data 27 (1998), 2, pp. 94-105.   CrossRef
  2. K. Bache and M. Lichman: UCI machine learning repository, 2013. URL \url{http://archive.ics.uci.edu/ml}.   CrossRef
  3. K. D. Bailey: Typologies and Taxonomies: An Introduction to Classification Techniques. Sage Publications Inc., Los Angeles 1994.   CrossRef
  4. J. C. Bezdek: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York 1981.   CrossRef
  5. S. Das, A. Abraham and A. Konar: Automatic clustering using an improved differential evolution algorithm. IEEE Trans. Sys. Man Cyber., Part A: Systems and Humans 38 (2008), 1, 218-237.   CrossRef
  6. A. P. Dempster, N. M. Laird and D. B. Rubin: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B \textit{39} (1977), 1, 1-38.   CrossRef
  7. E. Dimitriadou: cclust: Convex Clustering Methods and Clustering Indexes, 2012. URL \url{http://CRAN.R-project.org/package=cclust}.   CrossRef
  8. S. Dudoit and J. Fridlyand: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19 (2003), 9, 1090-2003.   CrossRef
  9. M. Ester, H. P. Kriegel, J. Sander and X. Xu: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining 1996, pp. 226-231.   CrossRef
  10. X. Fern and C. Brodley: Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. 21st International Conference on Machine learning 2004, pp. 36-43.   CrossRef
  11. C. Fraley and A. E. Raftery: Model-based clustering, discriminant analysis and density estimation. J. Amer. Statist. Assoc. 97 (2002), 611-631.   CrossRef
  12. C. Fraley and A. E. Raftery: MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering. Techn. Report 504, University of Washington, Department of Statistics, 2006.   CrossRef
  13. R. Ghaemi, N. Sulaiman, H. Ibrahim and N. Mustapha: A survey: Clustering ensembles techniques. In: Proc. International Conference on Computer, Electrical, and Systems Science, and Engineering (CESSE) 38 (2009), pp. 644-653.   CrossRef
  14. J. Ghosh and A. Acharya: Cluster ensembles. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery 1 (2011), 4, 305-315.   CrossRef
  15. S. J. Gould: Full House: The Spread of Excellence from Plato to Darwin. Harmony, New York 1996.   CrossRef
  16. M. Halkidi, Y. Batistakis and M. Vazirgiannis: Cluster validity methods: Part i. SIGMOD Record 31 (2002), 2, 40-45.   CrossRef
  17. J. Handl and J. Knowles: Multi-objective clustering and cluster validation. In: Multi-Objective Machine Learning (Studies in Computational Intelligence, Vol, 16), Springer, Berlin 2006, pp. 21-47.   CrossRef
  18. J. Handl and J. Knowles: An evolutionary approach to multiobjective clustering. IEEE Trans. Evolutionary Comput. 11 (2007), 56-76.   CrossRef
  19. J. Handl, J. Knowles and D. Kell: Computational cluster validation in post-genomic data analysis. Bioinformatics 21 (2005), 15, 3201-3212.   CrossRef
  20. J. Hartigan and M. Wong: A k-means clustering algorithm. Applied Statistics 28 (1979), 100-108.   CrossRef
  21. K. Hornik, I. Feinerer, M. Kober and C. Buchta: Spherical $k$-means clustering. J. Statist. Software 50 (2012), 10, 1-22.   CrossRef
  22. E. Hruschka, R. Campello, A. Freitas and A. de Carvalho: A survey of evolutionary algorithms for clustering. IEEE Trans. Sys. Man Cyber. Part C: Applications and Reviews 39 (2009), 2, 133-155.   CrossRef
  23. A. K. Jain: Data clustering: 50 years beyond k-means. Pattern Recognition Lett. 31 (2010), 8, 651-666.   CrossRef
  24. A. K. Jain, M. N. Murty and P. J. Flynn: Data clustering: A review. ACM Comput. Surveys 31 (1999), 3, 316-323.   CrossRef
  25. A. Karatzoglou, A. Smola, K. Hornik and A. Zeileis: kernlab - an {S4} package for kernel methods in {R}. J. Statist. Software 11 (2004), 9, 1-20.   CrossRef
  26. G. Karypis, R. Aggarwal, V. Kumar and S. Shekhar: Multilevel hypergraph partitioning: Applications in vlsi domain. In: Proc. Design and Automation Conference, 1997, pp. 526-529.   CrossRef
  27. L. Kaufman and P. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York 1990.   CrossRef
  28. K. Krishna and M. Narasimha Murty: Genetic k-means algorithm. Trans. Sys. Man Cyber. Part B 29 (1999), 3, 433-439.   CrossRef
  29. W. Kwedlo: A clustering method combining differential evolution with the k-means algorithm. Pattern Recognition Letters 32 (2011), 12, 1613-1621.   CrossRef
  30. J. MacQueen: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability 1 (1967), pp. 281-297.   CrossRef
  31. M. Maechler, P. Rousseeuw, A. Struyf, M. Hubert and K. Hornik: cluster: Cluster Analysis Basics and Extensions, 2013. R package version 1.14.4.   CrossRef
  32. S. Monti, P. Tamayo, J. Mesirov and T. Golub: Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52 (2003), 1-2, 91-118.   CrossRef
  33. K. Mullen, D. Ardia, D. Gil, D. Windover and J. Cline: DEoptim: An R package for global optimization by differential evolution. J. Statist. Software 40 (2011), 6, 1-26.   CrossRef
  34. C. Murthy and N. Chowdhury: In search of optimal clusters using genetic algorithms. Pattern Recognition Lett. 17 (1996), 8, 825-832.   CrossRef
  35. S. K. Pal and D. D. Majumder: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Sys. Man Cyber. 7 (1977), 625-629.   CrossRef
  36. S. Paterlini and T. Krink: Differential evolution and particle swarm optimisation in partitional clustering. Comput. Statist. Data Anal. 50 (2006), 5, 1220-1247.   CrossRef
  37. K. V. Price, R. M. Storn and J. A. Lampinen: Differential Evolution: A Practical Approach to Global Optimization. Springer-Verlag, Berlin 2006.   CrossRef
  38. V. Raghavan and K. Birchand: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proc. Second International Conference on Information Storage and Retrieval, 1979, pp. 10-22.   CrossRef
  39. R Core Team. \newblock{R:     CrossRef
  40. J. Shi and J. Malik: Normalized cuts and image segmentation. In: IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000), 8, 888-905.   CrossRef
  41. D. A. Simovici and Ch. Djeraba: Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics. Advanced information and knowledge processing. Springer, London 2008.   CrossRef
  42. T. I. Simpson, J. D. Armstrong and A. P. Jarman: Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinform. 11 (2010), 11-590.   CrossRef
  43. P. H. Sneath: The application of computers to taxonomy. Journal of general microbiology \textit{17} (1957), 1, 201-226.   CrossRef
  44. R. Storn and K. Price: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11 (1997), 4, 341-359.   CrossRef
  45. A. Strehl and J. Ghosh: Cluster ensembles - a knowledge reuse framework for combining partitionings. In: Proc. 11th National Conference On Artificial Intelligence, NCAI, Edmonton, Alberta 2002, pp. 93-98.   CrossRef
  46. A. Topchy, A. Jain and W. Punch: A mixture model of clustering ensembles. In: Proc. SIAM International Conference on Data Mining 2004, pp. 22-24.   CrossRef
  47. W. M. Trotter: Combinatorics and Partially Ordered Sets. The Johns Hopkins University Press, Baltimore 1992.   CrossRef
  48. J. Tvrdík and I. Křivý: Differential evolution with competing strategies applied to partitional clustering. Lecture Notes Comput. Sci. 7269 (2012), 136-144.   CrossRef
  49. P. Wang, C. Domeniconi and K. Laskey: Nonparametric bayesian clustering ensembles. Lecture Notes Comput. Sci. 6323 (2010), 3, 435-450.   CrossRef
  50. H. Wang, H. Shan and A. Banerjee: Bayesian cluster ensembles. Stat. Anal. Data Min. 4 (2011), 1, 54-70.   CrossRef
  51. Wikipedia: Partition of a set. \url{http://en.wikipedia.org/wiki/Partition_of_a_set}.   CrossRef
  52. R. Xu and D. Wunsch: Survey of clustering algorithms. IEEE Trans. Neural Networks 16 (2005), 3, 645-678.   CrossRef
  53. Ch. T. Zahn: Graph-theoretic methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 20 (1971), 31, 68-86.   CrossRef