KOBV Portal

Hits per page

hit 1 - 1 | 1 hit

Select All Export

Online Resource

Data preparation and domain-agnostic duplicate detection (2020)

Koumarelas, Ioannis [VerfasserIn] ; Naumann, Felix [AkademischeR BetreuerIn] 1971- ; Freytag, Johann-Christoph [AkademischeR BetreuerIn] 1954- ; [et al.]

Potsdam

add to watchlist on the watchlist

Details

UID:

gbv_174566808X

Format: 1 Online-Ressource (ii, 97 Seiten, 6627 KB) , Illustrationen, Diagramme

Content: Successfully completing any data science project demands careful consideration across its whole process. Although the focus is often put on later phases of the process, in practice, experts spend more time in earlier phases, preparing data, to make them consistent with the systems' requirements or to improve their models' accuracies. Duplicate detection is typically applied during the data cleaning phase, which is dedicated to removing data inconsistencies and improving the overall quality and usability of data. While data cleaning involves a plethora of approaches to perform specific operations, such as schema alignment and data normalization, the task of detecting and removing duplicate records is particularly challenging. Duplicates arise when multiple records representing the same entities exist in a database. Due to numerous reasons, spanning from simple typographical errors to different schemas and formats of integrated databases. Keeping a database free of duplicates is crucial for most use-cases, as their existence causes ...

Note: Dissertation Universität Potsdam 2020

Language: English

Keywords: Hochschulschrift

DOI: 10.25932/publishup-48913

URN: urn:nbn:de:kobv:517-opus4-489131

URL: kostenfrei

URL: https://d-nb.info/1225792576/34

URL: kostenfrei

Author information: Naumann, Felix 1971-

Author information: Ritter, Norbert