Information technologies such as social media, mobile computing, and the realization of the industrial Internet of Things (IoT) produce huge amounts of data every day. The development of powerful tools for knowledge-discovery is imperative to deal with such a volume of data. Clustering methods are among the most important knowledge-discovery techniques. The growth in computational power and algorithmic developments allow us to efficiently and accurately solve clustering problems in large datasets. However, these developments are insufficient to deal with clustering problems in big datasets. This is because these datasets cannot be processed as a whole due to hardware and computational restrictions. In this paper an iterative batch $k$-means ($ibk$-means) algorithm is proposed that yields good clustering results with low computation costs on big datasets. It is designed to cluster datasets using batch data. The efficiency and accuracy of the proposed algorithm are investigated depending on the size of batches, the number of attributes and clusters. The algorithm is compared with the classic $k$-means and mini batch $k$-means algorithms using computational results on several real-world datasets, all of which are available from the UCI Machine Learning Repository. The smallest dataset has 500000 data points and 2 attributes and the largest one contains 43930257 data points and 16 attributes. Results demonstrated that the $ibk$-means algorithm outperforms both the $k$-means and mini batch $k$-means algorithms in the sense of both efficiency and accuracy and it is applicable for the clustering of big datasets. The proposed algorithm provides real time clustering and may have direct applications in expert and intelligent systems. Furthermore, results from this paper will have a clear impact in the sense of designing more accurate and efficient clustering algorithms for big datasets taking into account available computer resources.
big data, $k$-means algorithm, batch clustering, mini batch $k$-means, cluster analysis
68T09, 90B99