KOBV Portal

Hits per page

hit 1 - 1 | 1 hit

Select All Export

Book

Scaling Up Set Similarity Joins Using ACost-Based Distributed-Parallel Framework : extended paper (2021)

Fier, Fabian ; Freytag, Johann-Christoph

Berlin : Humboldt-Universität zu Berlin

add to watchlist on the watchlist

Details

UID:

edochu_18452_23851

Content: The set similarity join (SSJ) is an important operation in data science. For example, the SSJ operation relates data from different sources or finds plagiarism. Common SSJ approaches are based on the filter-and-verification framework. Existing approaches are sequential (single-core), use multi-threading, or Map-Reduce-based distributed parallelization. The amount of data to be processed today is large and keeps growing. On the other hand, the SSJ is a compute-intensive operation. None of the existing SSJ methods scales to large datasets. Single- and multi-core-based methods are limited in terms of hardware. MapReduce-based methods do not scale due to too high and/or skewed data replication. We propose a novel, highly scalable distributed SSJ approach. It overcomes the limits and bottlenecks of existing parallel SSJ approaches. With a cost-based heuristic and a data-independent scaling mechanism we avoid intra-node data replication and recomputation. A heuristic assigns similar shares of compute costs to each node. A RAM usage estimation prevents swapping, which is critical for the runtime. Our approach significantly scales up the SSJ execution and processes much larger datasets than all parallel approaches designed so far.

Note: This is an extended version of our paper accepted for SISAP 2021. It additionally includes descriptions of experimental datasets and experimental results.

Language: English

DOI: 10.18452/23209

URN: urn:nbn:de:kobv:11-110-18452/23851-6

URL: Volltext (kostenfrei)

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Book

Fulltext

Inter-library loan

HU Berlin

hit 1 - 1 | 1 hit

Kooperativer Bibliotheksverbund

Berlin Brandenburg