An open-source framework for end-to-end analysis of electronic health record data

Authors

Lukas Heumos

Philipp Ehmele

Tim Treis

Julius Upmeier zu Belzen

Eljas Roellin

Lilly May

Altana Namsaraeva

Nastassya Horlava

Vladimir A. Shitov

Xinyue Zhang

Luke Zappia

Rainer Knoll

Niklas J. Lang

Leon Hetzel

Isaac Virshup

Lisa Sikkema

Fabiola Curion

Roland Eils

Herbert B. Schiller

Anne Hilgendorff

Fabian J. Theis

Date

September 12, 2024

Links
Citation stats
publications
9
supporting
0
mentioning
0
contrasting
0
Smart Citations
9
0
0
0
Citing PublicationsSupportingMentioningContrasting
View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Abstract

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

Citation

BibTeX citation:
@article{heumos2024,
  author = {Heumos, Lukas and Ehmele, Philipp and Treis, Tim and Upmeier
    zu Belzen, Julius and Roellin, Eljas and May, Lilly and Namsaraeva,
    Altana and Horlava, Nastassya and A. Shitov, Vladimir and Zhang,
    Xinyue and Zappia, Luke and Knoll, Rainer and J. Lang, Niklas and
    Hetzel, Leon and Virshup, Isaac and Sikkema, Lisa and Curion,
    Fabiola and Eils, Roland and B. Schiller, Herbert and Hilgendorff,
    Anne and J. Theis, Fabian},
  title = {An Open-Source Framework for End-to-End Analysis of
    Electronic Health Record Data},
  journal = {Nature Medicine},
  pages = {1-12},
  date = {2024-09-12},
  url = {https://doi.org/10.1038/s41591-024-03214-0},
  doi = {10.1038/s41591-024-03214-0},
  issn = {1078-8956},
  langid = {en},
  abstract = {With progressive digitalization of healthcare systems
    worldwide, large-scale collection of electronic health records
    (EHRs) has become commonplace. However, an extensible framework for
    comprehensive exploratory analysis that accounts for data
    heterogeneity is missing. Here we introduce ehrapy, a modular
    open-source Python framework designed for exploratory analysis of
    heterogeneous epidemiology and EHR data. ehrapy incorporates a
    series of analytical steps, from data extraction and quality control
    to the generation of low-dimensional representations. Complemented
    by rich statistical modules, ehrapy facilitates associating patients
    with disease states, differential comparison between patient
    clusters, survival analysis, trajectory inference, causal inference
    and more. Leveraging ontologies, ehrapy further enables data sharing
    and training EHR deep learning models, paving the way for
    foundational models in biomedical research. We demonstrate ehrapy’s
    features in six distinct examples. We applied ehrapy to stratify
    patients affected by unspecified pneumonia into finer-grained
    phenotypes. Furthermore, we reveal biomarkers for significant
    differences in survival among these groups. Additionally, we
    quantify medication-class effects of pneumonia medications on length
    of stay We further leveraged ehrapy to analyze cardiovascular risks
    across different data modalities. We reconstructed disease state
    trajectories in patients with severe acute respiratory syndrome
    coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we
    conducted a case study to demonstrate how ehrapy can detect and
    mitigate biases in EHR data. ehrapy, thus, provides a framework that
    we envision will standardize analysis pipelines on EHR data and
    serve as a cornerstone for the community.}
}
For attribution, please cite this work as:
Heumos, L., Ehmele, P., Treis, T., Upmeier zu Belzen, J., Roellin, E., May, L., Namsaraeva, A., Horlava, N., A. Shitov, V., Zhang, X., Zappia, L., Knoll, R., J. Lang, N., Hetzel, L., Virshup, I., Sikkema, L., Curion, F., Eils, R., B. Schiller, H., Hilgendorff, A. & J. Theis, F. An open-source framework for end-to-end analysis of electronic health record data. Nature Medicine 1–12 (2024). doi:10.1038/s41591-024-03214-0