In:
Genome Biology, Springer Science and Business Media LLC, Vol. 22, No. 1 ( 2021-12)
Abstract:
Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.
Type of Medium:
Online Resource
ISSN:
1474-760X
DOI:
10.1186/s13059-021-02452-6
Language:
English
Publisher:
Springer Science and Business Media LLC
Publication Date:
2021
detail.hit.zdb_id:
2040529-7