In:
International Journal of Semantic Computing, World Scientific Pub Co Pte Ltd, Vol. 17, No. 02 ( 2023-06), p. 199-221
Abstract:
In recent years, exciting sources of data have been modeled as knowledge graphs (KGs). This modeling represents both structural relationships and the entity-specific multi-modal data in KGs. In various data analytics pipelines and machine learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all-pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile contents like categorical relations between classes or their attribute literals like strings, timestamps or numeric data. In this paper, we propose the SimE4KG framework as a resource providing generic open-source modules that perform semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF KG, including the experiment results, hyper-parameter setup and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed Semantic ANalytics StAck (SANSA).
Type of Medium:
Online Resource
ISSN:
1793-351X
,
1793-7108
DOI:
10.1142/S1793351X23600012
Language:
English
Publisher:
World Scientific Pub Co Pte Ltd
Publication Date:
2023