clustree: producing clustering trees using ggraph

visualisation

software

Presentation at the userR! 2018 conference introducing the clustree package and demonstrating how it makes use of the ggraph package

Authors

Luke Zappia

Date

July 21, 2018

Event

useR! 2018 (Brisbane Convention and Exhibition Centre, Brisbane, Australia)

Links

Slides Code Video

Abstract

Clustering analysis is commonly used in many fields to group together similar samples. Many clustering algorithms exist, but all of them require some sort of user input to set parameters that affect the number of clusters produced. Deciding on the correct number of clusters for a given dataset is a difficult problem that can be tackled by looking at the relationships between samples at different resolutions. Here I will present clustree, an R package for producing clustering tree visualisations. These visualisations combine information from multiple clusterings with different resolutions, showing where new clusters come from and how samples change clusters as the number of clusters increases. Summarised information describing the samples in each cluster can be overlaid on the tree to give additional insight. I will also describe my experience developing clustree, particularly how I have made use of the ggraph package. The clustree package is available at https://github.com/lazappi/clustree and a preprint describing clustering trees can be read at https://www.biorxiv.org/content/early/2018/03/02/274035.