I am a bioinformatics postdoctoral researcher in the Theis Lab at the Helmholtz Zentrum München Institute of Computational Biology and the Technische Universität München. My research focuses on the analysis of single-cell RNA sequencing data including the development and benchmarking of computational methods. I am also interested in how best to visualise data more generally.
I completed my PhD in the Oshlack Lab.
Doctor of Philosophy (Bioinformatics), 2019
The University of Melbourne/Murdoch Children's Research Institute
Master of Science (Bioinformatics), 2015
The University of Melbourne
Bachelor of Science (Chemistry), 2011
The University of Melbourne
Diploma in Informatics, 2011
The University of Melbourne
Invited keynote at the European Bioconductor meeting 2020
Cell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration. Choosing a data integration method is a challenge due to the difficulty of defining integration success. Here, we benchmark 38 method and preprocessing combinations on 77 batches of gene expression, chromatin accessibility, and simulation data from 23 publications, altogether representing >1.2 million cells distributed in nine atlas-level integration tasks. Our integration tasks span several common sources of variation such as individuals, species, and experimental labs. We evaluate methods according to scalability, usability, and their ability to remove batch effects while retaining biological variation. Using 14 evaluation metrics, we find that highly variable gene selection improves the performance of data integration methods, whereas scaling pushes methods to prioritize batch removal over conservation of biological variation. Overall, BBKNN, Scanorama, and scVI perform well, particularly on complex integration tasks; Seurat v3 performs well on simpler tasks with distinct biological signals; and methods that prioritize batch removal perform best for ATAC-seq data integration. Our freely available reproducible python module can be used to identify optimal data integration methods for new data, benchmark new methods, and improve method development.
Bioconductor R package converting between scRNA-seq objects.
Dataset of NBA positions designed to replace the iris dataset.
Tutorial describing how to interact with the Scanpy Python package from R.
Functions for scraping git commits from repositories associated with a PhD (or anything else) and plotting them.
Materials for the COMBINE Australia R package development workshop
An R package for setting up a website to display analysis of Twitter hashtags
Analysis of Twitter activity for hashtags from various events, usually academic conferences.
CRAN R package for creating clustering trees, a visualisation for looking at clustering across resolutions.
Database and website cataloguing software tools for analysing single-cell RNA sequencing data.
Bioconductor R package for simulating scRNA-seq data.
A Python script for pretty printing of TeXcount output