Population-level integration of single-cell datasets enables multi-scale analysis across samples
The increasing generation of population-level single-cell atlases with hundreds or thousands of samples has the potential to link demographic and technical metadata with high-resolution cellular and tissue data in homeostasis and disease. Constructing such comprehensive references requires large-scale integration of heterogeneous cohorts with varying metadata capturing demographic and technical information. Here, we present single-cell population level integration (scPoli), a semi-supervised conditional deep generative model for data integration, label transfer and query-to-reference mapping. Unlike other models, scPoli learns both sample and cell representations, is aware of cell-type annotations and can integrate and annotate newly generated query datasets while providing an uncertainty mechanism to identify unknown populations. We extensively evaluated the method and showed its advantages over existing approaches. We applied scPoli to two population-level atlases of lung and peripheral blood mononuclear cells (PBMCs), the latter consisting of roughly 8 million cells across 2,375 samples. We demonstrate that scPoli allows atlas-level integration and automatic reference mapping with label transfer. It can explain sample-level biological and technical variations such as disease, anatomical location and assay by means of its novel sample embeddings. We use these embeddings to explore sample-level metadata, enable automatic sample classification and guide a data integration workflow. scPoli also enables simultaneous sample-level and cell-level analysis of gene expression patterns, revealing genes associated with batch effects and the main axes of between-sample variation. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.
Citation
@misc{dedonno2022,
author = {Carlo De Donno and Soroor Hediyeh-Zadeh and Marco
Wagenstetter and Amir Ali Moinfar and Luke Zappia and Mohammad
Lotfollahi and Fabian J Theis},
title = {Population-Level Integration of Single-Cell Datasets Enables
Multi-Scale Analysis Across Samples},
date = {2022-11-29},
url = {https://lazappi.id.au/publications/2022-deDonno-scPoli},
doi = {10.1101/2022.11.28.517803},
langid = {en},
abstract = {The increasing generation of population-level single-cell
atlases with hundreds or thousands of samples has the potential to
link demographic and technical metadata with high-resolution
cellular and tissue data in homeostasis and disease. Constructing
such comprehensive references requires large-scale integration of
heterogeneous cohorts with varying metadata capturing demographic
and technical information. Here, we present single-cell population
level integration (scPoli), a semi-supervised conditional deep
generative model for data integration, label transfer and
query-to-reference mapping. Unlike other models, scPoli learns both
sample and cell representations, is aware of cell-type annotations
and can integrate and annotate newly generated query datasets while
providing an uncertainty mechanism to identify unknown populations.
We extensively evaluated the method and showed its advantages over
existing approaches. We applied scPoli to two population-level
atlases of lung and peripheral blood mononuclear cells (PBMCs), the
latter consisting of roughly 8 million cells across 2,375 samples.
We demonstrate that scPoli allows atlas-level integration and
automatic reference mapping with label transfer. It can explain
sample-level biological and technical variations such as disease,
anatomical location and assay by means of its novel sample
embeddings. We use these embeddings to explore sample-level
metadata, enable automatic sample classification and guide a data
integration workflow. scPoli also enables simultaneous sample-level
and cell-level analysis of gene expression patterns, revealing genes
associated with batch effects and the main axes of between-sample
variation. We envision scPoli becoming an important tool for
population-level single-cell data integration facilitating atlas use
but also interpretation by means of multi-scale analyses.}
}