Parallel k-Modes Algorithm for Spark Framework

KIPS Transactions on Software and Data Engineering, Vol. 6, No.10, pp.487-492, October 2017
10.3745/KTSDE.2017.6.10.487, Full Text

Abstract

Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from October 15, 2016)

Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


Cite this paper

[KIPS Transactions Style]
J. Chung, "Parallel k-Modes Algorithm for Spark Framework," KIPS Transactions on Software and Data Engineering, Vol.6, No.10, pp.487-492, 2017, DOI: 10.3745/KTSDE.2017.6.10.487.

[IEEE Style]
Jaehwa Chung, "Parallel k-Modes Algorithm for Spark Framework," KIPS Transactions on Software and Data Engineering, vol. 6, no. 10, pp. 487-492, 2017. DOI: 10.3745/KTSDE.2017.6.10.487.

[ACM Style]
Chung, J. 2017. Parallel k-Modes Algorithm for Spark Framework. KIPS Transactions on Software and Data Engineering, 6, 10, (2017), 487-492. DOI: 10.3745/KTSDE.2017.6.10.487.