In:
ACM Transactions on Knowledge Discovery from Data, Association for Computing Machinery (ACM), Vol. 6, No. 2 ( 2012-07), p. 1-28
Kurzfassung:
Lack of supervision in clustering algorithms often leads to clusters that are not useful or interesting to human reviewers. We investigate if supervision can be automatically transferred for clustering a target task, by providing a relevant supervised partitioning of a dataset from a different source task. The target clustering is made more meaningful for the human user by trading-off intrinsic clustering goodness on the target task for alignment with relevant supervised partitions in the source task, wherever possible. We propose a cross-guided clustering algorithm that builds on traditional k-means by aligning the target clusters with source partitions. The alignment process makes use of a cross-task similarity measure that discovers hidden relationships across tasks. When the source and target tasks correspond to different domains with potentially different vocabularies, we propose a projection approach using pivot vocabularies for the cross-domain similarity measure. Using multiple real-world and synthetic datasets, we show that our approach improves clustering accuracy significantly over traditional k-means and state-of-the-art semi-supervised clustering baselines, over a wide range of data characteristics and parameter settings.
Materialart:
Online-Ressource
ISSN:
1556-4681
,
1556-472X
DOI:
10.1145/2297456.2297461
Sprache:
Englisch
Verlag:
Association for Computing Machinery (ACM)
Publikationsdatum:
2012
ZDB Id:
2257358-6