Clustering trees for visualising scRNA-seq data

visualisation
clustering
software
clustree
R
CRAN
Selected talk at the Oz Single Cells 2018 conference where I presented clustering trees and the clustree package
Authors

Luke Zappia

Alicia Oshlack

Date

July 16, 2018

Event

Oz Single Cells 2018 (Garvan Institute of Medical Research, Sydney, Australia)

Links
Abstract

Single-cell RNA-sequencing is commonly used to interrogate complex tissues in order to identify and compare the cell types present. This type of experiment is particularly prevalent in the developmental setting. A key step in this approach is assigning cells to different clusters that are assumed to be distinct cell types. Although this can be done by comparison with reference datasets, cells are more routinely grouped using unsupervised clustering and we have catalogued more than 60 scRNA-seq clustering methods. Most clustering methods have parameters which affect the number of clusters produced, either through specifying an exact number, a parameter which controls the clustering resolution or indirectly through other parameters. The resolution that is chosen can have a profound effect on further analysis but it is unclear how to make this choice. Existing clustering metrics often score only single clusters or resolutions, or require datasets to be perturbed and clustered multiple times which can be infeasible for large datasets. Here we present clustering trees as an alternative visualisation that shows the relationship between clusters as the clustering resolution increases. These trees can highlight instability that may indicate overclustering and help choose which resolution to use, particularly when combined with existing domain knowledge such as the expression of marker genes. More generally, clustering trees are a compact, information dense visualisation that can serve as an alternative to plotting cells in reduced dimensions such as t-SNE. Here we explain how clustering trees are produced using the clustree R package (http://cran.r-project.org/package=clustree) and illustrate how they can be used with a examples of scRNA-seq data from kidney organoids.