Keywords
|
Ensemble clustering ,AHC ,Semi-supervised clustering, Distance metric, Information constraints
|
Abstract
|
Agglomerative Hierarchical Clustering (AHC) is a bottom-up clustering strategy in which each object is originally a cluster, and more pairs of clusters are formed by traversing the hierarchy. It has been proven that there is no individual AHC clustering algorithm that can be efficient in all situations. In order to address this problem, ensemble clustering techniques have been introduced. These techniques combine the results of several output partitions to achieve a consensus with higher accuracy compared to an individual clustering algorithm. This paper proposes an AHC-based ensemble semi-supervised clustering algorithm to improve performance. In semi-supervised clustering, class membership information is used in some objects. Here, we introduce the Semi-Supervised Ensemble Hierarchical Clustering based on Constraints Information (SSEHCCI) algorithm. SSEHCCI is developed using several individual clustering algorithms based on AHC. SSEHCCI includes a flexible weighting policy to generate base partitions and uses the constraints information to configure the semi-supervised clustering. In addition, SSEHCCI uses an innovative distance measure to calculate the distance between each pair of objects. Experimental results show that SSEHCCI performs better than existing semi-supervised algorithms on some University of California Irvine (UCI) datasets. Specifically, we observed an average accuracy of SSEHCCI compared to SSDC and RSSC of 2.6% and 1.8%, respectively.
|