KOBV Portal

Hits per page

hit 1 - 1 | 1 hit

Select All Export

Online Resource

Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

Stanchev, Lubomir

World Scientific Pub Co Pte Ltd ; 2016

In: International Journal of Semantic Computing Vol. 10, No. 04 ( 2016-12), p. 527-555

add to watchlist on the watchlist

Details

In: International Journal of Semantic Computing, World Scientific Pub Co Pte Ltd, Vol. 10, No. 04 ( 2016-12), p. 527-555

Abstract: In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains [Formula: see text] newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.

Type of Medium: Online Resource

ISSN: 1793-351X , 1793-7108

URL: Article

DOI: 10.1142/S1793351X16400195

Language: English

Publisher: World Scientific Pub Co Pte Ltd

Publication Date: 2016