Integrating single-cell RNA-seq datasets with substantial batch effects
Integration of single-cell RNA-sequencing (scRNA-seq) datasets is standard in scRNA-seq analysis. Nevertheless, current computational methods struggle to harmonize datasets across systems such as species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Conditional variational autoencoders (cVAE) are a popular integration method, however, existing strategies for stronger batch correction have limitations. Increasing the Kullback–Leibler divergence regularization does not improve integration and adversarial learning removes biological signals. Here, we propose sysVI, a cVAE-based method employing VampPrior and cycle-consistency constraints. We show that sysVI integrates across systems and improves biological signals for downstream interpretation of cell states and conditions.
Citation
@article{hrovatin2025,
author = {Hrovatin, Karin and Ali Moinfar, Amir and Zappia, Luke and
Parikh, Shrey and Tejada Lapuerta, Alejandro and Lengerich, Benjamin
and Kellis, Manolis and J. Theis, Fabian},
title = {Integrating Single-Cell {RNA-seq} Datasets with Substantial
Batch Effects},
journal = {BMC Genomics},
volume = {26},
number = {1},
pages = {974},
date = {2025-10-30},
url = {https://doi.org/10.1101/2023.11.03.565463},
doi = {10.1186/s12864-025-12126-3},
issn = {1471-2164},
langid = {en},
abstract = {Integration of single-cell RNA-sequencing (scRNA-seq)
datasets is standard in scRNA-seq analysis. Nevertheless, current
computational methods struggle to harmonize datasets across systems
such as species, organoids and primary tissue, or different
scRNA-seq protocols, including single-cell and single-nuclei.
Conditional variational autoencoders (cVAE) are a popular
integration method, however, existing strategies for stronger batch
correction have limitations. Increasing the Kullback–Leibler
divergence regularization does not improve integration and
adversarial learning removes biological signals. Here, we propose
sysVI, a cVAE-based method employing VampPrior and cycle-consistency
constraints. We show that sysVI integrates across systems and
improves biological signals for downstream interpretation of cell
states and conditions.}
}