An open-source framework for end-to-end analysis of electronic health record data

medical records
software
Authors

Lukas Heumos

Philipp Ehmele

Tim Treis

Julius Upmeier zu Belzen

Eljas Roellin

Lilly May

Altana Namsaraeva

Nastassya Horlava

Vladimir A. Shitov

Xinyue Zhang

Luke Zappia

Rainer Knoll

Niklas J. Lang

Leon Hetzel

Isaac Virshup

Lisa Sikkema

Fabiola Curion

Roland Eils

Herbert B. Schiller

Anne Hilgendorff

Fabian J. Theis

Date

September 12, 2024

Links
Citation stats
Abstract

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

Citation

BibTeX citation:
@article{heumos2024,
  author = {Heumos, Lukas and Ehmele, Philipp and Treis, Tim and Upmeier
    zu Belzen, Julius and Roellin, Eljas and May, Lilly and Namsaraeva,
    Altana and Horlava, Nastassya and A. Shitov, Vladimir and Zhang,
    Xinyue and Zappia, Luke and Knoll, Rainer and J. Lang, Niklas and
    Hetzel, Leon and Virshup, Isaac and Sikkema, Lisa and Curion,
    Fabiola and Eils, Roland and B. Schiller, Herbert and Hilgendorff,
    Anne and J. Theis, Fabian},
  title = {An Open-Source Framework for End-to-End Analysis of
    Electronic Health Record Data},
  journal = {Nature Medicine},
  pages = {1-12},
  date = {2024-09-12},
  url = {https://lazappi.id.au/publications/2024-huemos-ehrapy/},
  doi = {10.1038/s41591-024-03214-0},
  issn = {1078-8956},
  langid = {en},
  abstract = {With progressive digitalization of healthcare systems
    worldwide, large-scale collection of electronic health records
    (EHRs) has become commonplace. However, an extensible framework for
    comprehensive exploratory analysis that accounts for data
    heterogeneity is missing. Here we introduce ehrapy, a modular
    open-source Python framework designed for exploratory analysis of
    heterogeneous epidemiology and EHR data. ehrapy incorporates a
    series of analytical steps, from data extraction and quality control
    to the generation of low-dimensional representations. Complemented
    by rich statistical modules, ehrapy facilitates associating patients
    with disease states, differential comparison between patient
    clusters, survival analysis, trajectory inference, causal inference
    and more. Leveraging ontologies, ehrapy further enables data sharing
    and training EHR deep learning models, paving the way for
    foundational models in biomedical research. We demonstrate ehrapy’s
    features in six distinct examples. We applied ehrapy to stratify
    patients affected by unspecified pneumonia into finer-grained
    phenotypes. Furthermore, we reveal biomarkers for significant
    differences in survival among these groups. Additionally, we
    quantify medication-class effects of pneumonia medications on length
    of stay We further leveraged ehrapy to analyze cardiovascular risks
    across different data modalities. We reconstructed disease state
    trajectories in patients with severe acute respiratory syndrome
    coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we
    conducted a case study to demonstrate how ehrapy can detect and
    mitigate biases in EHR data. ehrapy, thus, provides a framework that
    we envision will standardize analysis pipelines on EHR data and
    serve as a cornerstone for the community.}
}
For attribution, please cite this work as:
Heumos, Lukas, Philipp Ehmele, Tim Treis, Julius Upmeier zu Belzen, Eljas Roellin, Lilly May, Altana Namsaraeva, et al. 2024. “An Open-Source Framework for End-to-End Analysis of Electronic Health Record Data.” Nature Medicine, September, 1–12. https://doi.org/10.1038/s41591-024-03214-0.