Sfaira accelerates data and model reuse in single cell genomics

single-cell
rna-seq
database
website
Authors

David S Fischer

Leander Dony

Martin König

Abdul Moeed

Luke Zappia

Lukas Heumos

Sophie Tritschler

Olle Holmberg

Hananeh Aliee

Fabian J Theis

Date

August 25, 2021

Links
Citation stats
Abstract

Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

Citation

BibTeX citation:
@article{s_fischer2021,
  author = {S Fischer, David and Dony, Leander and König, Martin and
    Moeed, Abdul and Zappia, Luke and Heumos, Lukas and Tritschler,
    Sophie and Holmberg, Olle and Aliee, Hananeh and J Theis, Fabian},
  title = {Sfaira Accelerates Data and Model Reuse in Single Cell
    Genomics},
  journal = {Genome biology},
  volume = {22},
  number = {1},
  pages = {248},
  date = {2021-08-25},
  url = {https://lazappi.id.au/publications/2021-fischer-sfaira/},
  doi = {10.1186/s13059-021-02452-6},
  issn = {1465-6906},
  langid = {en},
  abstract = {Single-cell RNA-seq datasets are often first analyzed
    independently without harnessing model fits from previous studies,
    and are then contextualized with public data sets, requiring
    time-consuming data wrangling. We address these issues with sfaira,
    a single-cell data zoo for public data sets paired with a model zoo
    for executable pre-trained models. The data zoo is designed to
    facilitate contribution of data sets using ontologies for metadata.
    We propose an adaption of cross-entropy loss for cell type
    classification tailored to datasets annotated at different levels of
    coarseness. We demonstrate the utility of sfaira by training models
    across anatomic data partitions on 8 million cells.}
}
For attribution, please cite this work as:
S Fischer, David, Leander Dony, Martin König, Abdul Moeed, Luke Zappia, Lukas Heumos, Sophie Tritschler, Olle Holmberg, Hananeh Aliee, and Fabian J Theis. 2021. “Sfaira Accelerates Data and Model Reuse in Single Cell Genomics.” Genome Biology 22 (1): 248. https://doi.org/10.1186/s13059-021-02452-6.