Population-level integration of single-cell datasets enables multi-scale analysis across samples

single-cell
rna-seq
methods
integration
software
Authors

Carlo De Donno

Soroor Hediyeh-Zadeh

Amir Ali Moinfar

Marco Wagenstetter

Luke Zappia

Mohammad Lotfollahi

Fabian J. Theis

Date

October 9, 2023

Links
Citation stats
Abstract

The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.

Citation

BibTeX citation:
@article{de_donno2023,
  author = {De Donno, Carlo and Hediyeh-Zadeh, Soroor and Ali Moinfar,
    Amir and Wagenstetter, Marco and Zappia, Luke and Lotfollahi,
    Mohammad and J. Theis, Fabian},
  title = {Population-Level Integration of Single-Cell Datasets Enables
    Multi-Scale Analysis Across Samples},
  journal = {Nature Methods},
  volume = {20},
  number = {11},
  pages = {1683-1692},
  date = {2023-10-09},
  url = {https://lazappi.id.au/publications/2023-deDonno-scPoli/},
  doi = {10.1038/s41592-023-02035-2},
  issn = {1548-7091},
  langid = {en},
  abstract = {The increasing generation of population-level single-cell
    atlases has the potential to link sample metadata with cellular
    data. Constructing such references requires integration of
    heterogeneous cohorts with varying metadata. Here we present
    single-cell population level integration (scPoli), an open-world
    learner that incorporates generative models to learn sample and cell
    representations for data integration, label transfer and reference
    mapping. We applied scPoli on population-level atlases of lung and
    peripheral blood mononuclear cells, the latter consisting of 7.8
    million cells across 2,375 samples. We demonstrate that scPoli can
    explain sample-level biological and technical variations using
    sample embeddings revealing genes associated with batch effects and
    biological effects. scPoli is further applicable to single-cell
    sequencing assay for transposase-accessible chromatin and
    cross-species datasets, offering insights into chromatin
    accessibility and comparative genomics. We envision scPoli becoming
    an important tool for population-level single-cell data integration
    facilitating atlas use but also interpretation by means of
    multi-scale analyses.}
}
For attribution, please cite this work as:
De Donno, Carlo, Soroor Hediyeh-Zadeh, Amir Ali Moinfar, Marco Wagenstetter, Luke Zappia, Mohammad Lotfollahi, and Fabian J. Theis. 2023. “Population-Level Integration of Single-Cell Datasets Enables Multi-Scale Analysis Across Samples.” Nature Methods 20 (11): 1683–92. https://doi.org/10.1038/s41592-023-02035-2.