Format:
1 Online-Ressource (x, ii, 117 Seiten)
,
Illustrationen, Diagramme
Content:
It is estimated that data scientists spend up to 80% of the time exploring, cleaning, and transforming their data. A major reason for that expenditure is the lack of knowledge about the used data, which are often from different sources and have heterogeneous structures. As a means to describe various properties of data, metadata can help data scientists understand and prepare their data, saving time for innovative and valuable data analytics. However, metadata do not always exist: some data file formats are not capable of storing them; metadata were deleted for privacy concerns; legacy data may have been produced by systems that were not designed to store and handle meta- data. As data are being produced at an unprecedentedly fast pace and stored in diverse formats, manually creating metadata is not only impractical but also error-prone, demanding automatic approaches for metadata detection. In this thesis, we are focused on detecting metadata in CSV files – a type of plain-text file that, similar to spreadsheets, may contain ...
Note:
Dissertation Universität Potsdam 2022
Language:
English
Keywords:
Hochschulschrift
DOI:
10.25932/publishup-56620
URN:
urn:nbn:de:kobv:517-opus4-566204
Author information:
Naumann, Felix 1971-
Author information:
Mitschang, Bernhard