Integrating single-cell RNA-seq datasets with substantial batch effects

single-cell
rna-seq
integration
batch effects
methods
Authors

Karin Hrovatin

Amir Ali Moinfar

Luke Zappia

Shrey Parikh

Alejandro Tejada Lapuerta

Benjamin Lengerich

Manolis Kellis

Fabian J. Theis

Date

October 30, 2025

Links
Citation stats
Abstract

Integration of single-cell RNA-sequencing (scRNA-seq) datasets is standard in scRNA-seq analysis. Nevertheless, current computational methods struggle to harmonize datasets across systems such as species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Conditional variational autoencoders (cVAE) are a popular integration method, however, existing strategies for stronger batch correction have limitations. Increasing the Kullback–Leibler divergence regularization does not improve integration and adversarial learning removes biological signals. Here, we propose sysVI, a cVAE-based method employing VampPrior and cycle-consistency constraints. We show that sysVI integrates across systems and improves biological signals for downstream interpretation of cell states and conditions.

Citation

BibTeX citation:
@article{hrovatin2025,
  author = {Hrovatin, Karin and Ali Moinfar, Amir and Zappia, Luke and
    Parikh, Shrey and Tejada Lapuerta, Alejandro and Lengerich, Benjamin
    and Kellis, Manolis and J. Theis, Fabian},
  title = {Integrating Single-Cell {RNA-seq} Datasets with Substantial
    Batch Effects},
  journal = {BMC Genomics},
  volume = {26},
  number = {1},
  pages = {974},
  date = {2025-10-30},
  url = {https://doi.org/10.1101/2023.11.03.565463},
  doi = {10.1186/s12864-025-12126-3},
  issn = {1471-2164},
  langid = {en},
  abstract = {Integration of single-cell RNA-sequencing (scRNA-seq)
    datasets is standard in scRNA-seq analysis. Nevertheless, current
    computational methods struggle to harmonize datasets across systems
    such as species, organoids and primary tissue, or different
    scRNA-seq protocols, including single-cell and single-nuclei.
    Conditional variational autoencoders (cVAE) are a popular
    integration method, however, existing strategies for stronger batch
    correction have limitations. Increasing the Kullback–Leibler
    divergence regularization does not improve integration and
    adversarial learning removes biological signals. Here, we propose
    sysVI, a cVAE-based method employing VampPrior and cycle-consistency
    constraints. We show that sysVI integrates across systems and
    improves biological signals for downstream interpretation of cell
    states and conditions.}
}
For attribution, please cite this work as:
Hrovatin, K., Ali Moinfar, A., Zappia, L., Parikh, S., Tejada Lapuerta, A., Lengerich, B., Kellis, M. & J. Theis, F. Integrating single-cell RNA-seq datasets with substantial batch effects. BMC Genomics 26, 974 (2025).