Format:
1 Online-Ressource (viii, ii, 141 Seiten, 5608 KB)
,
Diagramme
Content:
Data profiling is the computer science discipline of analyzing a given dataset for its metadata. The types of metadata range from basic statistics, such as tuple counts, column aggregations, and value distributions, to much more complex structures, in particular inclusion dependencies (INDs), unique column combinations (UCCs), and functional dependencies (FDs). If present, these statistics and structures serve to efficiently store, query, change, and understand the data. Most datasets, however, do not provide their metadata explicitly so that data scientists need to profile them. While basic statistics are relatively easy to calculate, more complex structures present difficult, mostly NP-complete discovery tasks; even with good domain knowledge, it is hardly possible to detect them manually. Therefore, various profiling algorithms have been developed to automate the discovery. None of them, however, can process datasets of typical real-world size, because their resource consumptions and/or execution times exceed effective limits .…
Note:
Dissertation Universität Potsdam 2017
Additional Edition:
Erscheint auch als Druck-Ausgabe Papenbrock, Thorsten Data profiling Potsdam, 2017
Language:
English
Keywords:
Informationssystem
;
Data-Profiling
;
Datenanalyse
;
Hochschulschrift
URN:
urn:nbn:de:kobv:517-opus4-406705
URL:
https://nbn-resolving.org/urn:nbn:de:kobv:517-opus4-406705
URL:
https://d-nb.info/1217717765/34
Author information:
Naumann, Felix 1971-
Bookmarklink