Exploratory electronic health record analysis with ehrapy

medical records
software
Authors

Lukas Heumos

Philipp Ehmele

Tim Treis

Julius Upmeier zu Belzen

Altana Namsaraeva

Nastassya Horlava

Vladimir A. Shitov

Xinyue Zhang

Luke Zappia

Rainer Knoll

Niklas J. Lang

Leon Hetzel

Isaac Virshup

Lisa Sikkema

Eljas Roellin

Fabiola Curion

Roland Eils

Herbert B. Schiller

Anne Hilgendorff

Fabian J. Theis

Date

December 11, 2023

Links
Citation stats
Abstract

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here, we introduce ehrapy, a modular open-source Python framework designed for exploratory end-to-end analysis of heterogeneous epidemiology and electronic health record data. Ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference, and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models paving the way for foundational models in biomedical research. We demonstrated ehrapy’s features in five distinct examples: We first applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we revealed biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. Finally, we reconstructed disease state trajectories in SARS-CoV-2 patients based on imaging data. Ehrapy thus provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

Citation

BibTeX citation:
@misc{heumos2023,
  author = {Heumos, Lukas and Ehmele, Philipp and Treis, Tim and Upmeier
    zu Belzen, Julius and Namsaraeva, Altana and Horlava, Nastassya and
    A. Shitov, Vladimir and Zhang, Xinyue and Zappia, Luke and Knoll,
    Rainer and J. Lang, Niklas and Hetzel, Leon and Virshup, Isaac and
    Sikkema, Lisa and Roellin, Eljas and Curion, Fabiola and Eils,
    Roland and B. Schiller, Herbert and Hilgendorff, Anne and J. Theis,
    Fabian},
  title = {Exploratory Electronic Health Record Analysis with Ehrapy},
  date = {2023-12-11},
  url = {https://lazappi.id.au/publications/2023-huemos-ehrapy},
  doi = {10.1101/2023.12.11.23299816},
  langid = {en},
  abstract = {With progressive digitalization of healthcare systems
    worldwide, large-scale collection of electronic health records
    (EHRs) has become commonplace. However, an extensible framework for
    comprehensive exploratory analysis that accounts for data
    heterogeneity is missing. Here, we introduce ehrapy, a modular
    open-source Python framework designed for exploratory end-to-end
    analysis of heterogeneous epidemiology and electronic health record
    data. Ehrapy incorporates a series of analytical steps, from data
    extraction and quality control to the generation of low-dimensional
    representations. Complemented by rich statistical modules, ehrapy
    facilitates associating patients with disease states, differential
    comparison between patient clusters, survival analysis, trajectory
    inference, causal inference, and more. Leveraging ontologies, ehrapy
    further enables data sharing and training EHR deep learning models
    paving the way for foundational models in biomedical research. We
    demonstrated ehrapy’s features in five distinct examples: We first
    applied ehrapy to stratify patients affected by unspecified
    pneumonia into finer-grained phenotypes. Furthermore, we revealed
    biomarkers for significant differences in survival among these
    groups. Additionally, we quantify medication-class effects of
    pneumonia medications on length of stay. We further leveraged ehrapy
    to analyze cardiovascular risks across different data modalities.
    Finally, we reconstructed disease state trajectories in SARS-CoV-2
    patients based on imaging data. Ehrapy thus provides a framework
    that we envision will standardize analysis pipelines on EHR data and
    serve as a cornerstone for the community.}
}
For attribution, please cite this work as:
Heumos, Lukas, Philipp Ehmele, Tim Treis, Julius Upmeier zu Belzen, Altana Namsaraeva, Nastassya Horlava, Vladimir A. Shitov, et al. 2023. “Exploratory Electronic Health Record Analysis with Ehrapy.” medRxiv. https://doi.org/10.1101/2023.12.11.23299816.