Exploratory electronic health record analysis with ehrapy
With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here, we introduce ehrapy, a modular open-source Python framework designed for exploratory end-to-end analysis of heterogeneous epidemiology and electronic health record data. Ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference, and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models paving the way for foundational models in biomedical research. We demonstrated ehrapy’s features in five distinct examples: We first applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we revealed biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. Finally, we reconstructed disease state trajectories in SARS-CoV-2 patients based on imaging data. Ehrapy thus provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.
Citation
@misc{heumos2023,
author = {Heumos, Lukas and Ehmele, Philipp and Treis, Tim and Upmeier
zu Belzen, Julius and Namsaraeva, Altana and Horlava, Nastassya and
A. Shitov, Vladimir and Zhang, Xinyue and Zappia, Luke and Knoll,
Rainer and J. Lang, Niklas and Hetzel, Leon and Virshup, Isaac and
Sikkema, Lisa and Roellin, Eljas and Curion, Fabiola and Eils,
Roland and B. Schiller, Herbert and Hilgendorff, Anne and J. Theis,
Fabian},
title = {Exploratory Electronic Health Record Analysis with Ehrapy},
date = {2023-12-11},
url = {https://lazappi.id.au/publications/2023-huemos-ehrapy},
doi = {10.1101/2023.12.11.23299816},
langid = {en},
abstract = {With progressive digitalization of healthcare systems
worldwide, large-scale collection of electronic health records
(EHRs) has become commonplace. However, an extensible framework for
comprehensive exploratory analysis that accounts for data
heterogeneity is missing. Here, we introduce ehrapy, a modular
open-source Python framework designed for exploratory end-to-end
analysis of heterogeneous epidemiology and electronic health record
data. Ehrapy incorporates a series of analytical steps, from data
extraction and quality control to the generation of low-dimensional
representations. Complemented by rich statistical modules, ehrapy
facilitates associating patients with disease states, differential
comparison between patient clusters, survival analysis, trajectory
inference, causal inference, and more. Leveraging ontologies, ehrapy
further enables data sharing and training EHR deep learning models
paving the way for foundational models in biomedical research. We
demonstrated ehrapy’s features in five distinct examples: We first
applied ehrapy to stratify patients affected by unspecified
pneumonia into finer-grained phenotypes. Furthermore, we revealed
biomarkers for significant differences in survival among these
groups. Additionally, we quantify medication-class effects of
pneumonia medications on length of stay. We further leveraged ehrapy
to analyze cardiovascular risks across different data modalities.
Finally, we reconstructed disease state trajectories in SARS-CoV-2
patients based on imaging data. Ehrapy thus provides a framework
that we envision will standardize analysis pipelines on EHR data and
serve as a cornerstone for the community.}
}