An open-source framework for end-to-end analysis of electronic health record data
With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.
Citation
@article{heumos2024,
author = {Heumos, Lukas and Ehmele, Philipp and Treis, Tim and Upmeier
zu Belzen, Julius and Roellin, Eljas and May, Lilly and Namsaraeva,
Altana and Horlava, Nastassya and A. Shitov, Vladimir and Zhang,
Xinyue and Zappia, Luke and Knoll, Rainer and J. Lang, Niklas and
Hetzel, Leon and Virshup, Isaac and Sikkema, Lisa and Curion,
Fabiola and Eils, Roland and B. Schiller, Herbert and Hilgendorff,
Anne and J. Theis, Fabian},
title = {An Open-Source Framework for End-to-End Analysis of
Electronic Health Record Data},
journal = {Nature Medicine},
pages = {1-12},
date = {2024-09-12},
url = {https://lazappi.id.au/publications/2024-huemos-ehrapy/},
doi = {10.1038/s41591-024-03214-0},
issn = {1078-8956},
langid = {en},
abstract = {With progressive digitalization of healthcare systems
worldwide, large-scale collection of electronic health records
(EHRs) has become commonplace. However, an extensible framework for
comprehensive exploratory analysis that accounts for data
heterogeneity is missing. Here we introduce ehrapy, a modular
open-source Python framework designed for exploratory analysis of
heterogeneous epidemiology and EHR data. ehrapy incorporates a
series of analytical steps, from data extraction and quality control
to the generation of low-dimensional representations. Complemented
by rich statistical modules, ehrapy facilitates associating patients
with disease states, differential comparison between patient
clusters, survival analysis, trajectory inference, causal inference
and more. Leveraging ontologies, ehrapy further enables data sharing
and training EHR deep learning models, paving the way for
foundational models in biomedical research. We demonstrate ehrapy’s
features in six distinct examples. We applied ehrapy to stratify
patients affected by unspecified pneumonia into finer-grained
phenotypes. Furthermore, we reveal biomarkers for significant
differences in survival among these groups. Additionally, we
quantify medication-class effects of pneumonia medications on length
of stay We further leveraged ehrapy to analyze cardiovascular risks
across different data modalities. We reconstructed disease state
trajectories in patients with severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we
conducted a case study to demonstrate how ehrapy can detect and
mitigate biases in EHR data. ehrapy, thus, provides a framework that
we envision will standardize analysis pipelines on EHR data and
serve as a cornerstone for the community.}
}