Population-level integration of single-cell datasets enables multi-scale analysis across samples
The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.
Citation
@article{de_donno2023,
author = {De Donno, Carlo and Hediyeh-Zadeh, Soroor and Ali Moinfar,
Amir and Wagenstetter, Marco and Zappia, Luke and Lotfollahi,
Mohammad and J. Theis, Fabian},
title = {Population-Level Integration of Single-Cell Datasets Enables
Multi-Scale Analysis Across Samples},
journal = {Nature Methods},
volume = {20},
number = {11},
pages = {1683-1692},
date = {2023-10-09},
url = {https://lazappi.id.au/publications/2023-deDonno-scPoli/},
doi = {10.1038/s41592-023-02035-2},
issn = {1548-7091},
langid = {en},
abstract = {The increasing generation of population-level single-cell
atlases has the potential to link sample metadata with cellular
data. Constructing such references requires integration of
heterogeneous cohorts with varying metadata. Here we present
single-cell population level integration (scPoli), an open-world
learner that incorporates generative models to learn sample and cell
representations for data integration, label transfer and reference
mapping. We applied scPoli on population-level atlases of lung and
peripheral blood mononuclear cells, the latter consisting of 7.8
million cells across 2,375 samples. We demonstrate that scPoli can
explain sample-level biological and technical variations using
sample embeddings revealing genes associated with batch effects and
biological effects. scPoli is further applicable to single-cell
sequencing assay for transposase-accessible chromatin and
cross-species datasets, offering insights into chromatin
accessibility and comparative genomics. We envision scPoli becoming
an important tool for population-level single-cell data integration
facilitating atlas use but also interpretation by means of
multi-scale analyses.}
}