<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[lazappi]]></title><description><![CDATA[Thoughts, stories and ideas.]]></description><link>http://lazappi.id.au/</link><generator>Ghost 0.9</generator><lastBuildDate>Wed, 21 Feb 2018 23:00:36 GMT</lastBuildDate><atom:link href="http://lazappi.id.au/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Joining the Dots Twitter analysis]]></title><description><![CDATA[Analysis of tweets at the Joining the Dots visualisation symposium.]]></description><link>http://lazappi.id.au/joining-the-dots-twitter-analysis/</link><guid isPermaLink="false">9cd0884f-1f32-44b4-999c-6a94d1f34959</guid><category><![CDATA[conference]]></category><category><![CDATA[visualisation]]></category><category><![CDATA[twitter]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Fri, 18 Aug 2017 08:47:15 GMT</pubDate><content:encoded><![CDATA[<p>Today I attended the <a href="https://joiningthedots.github.io/">Joining the Dots</a> visualisation symposium. You can see the slides for my talk about clustering trees <a href="https://speakerdeck.com/lazappi/building-a-clustering-tree">here</a>. It was a great event and hope we see more meetings like this in the future. Here is an analysis of the Twitter activity on the <a href="https://twitter.com/search?q=%23jtdwehi&amp;src=typd">#jtdwehi</a> hashtag, thanks to code from <a href="https://nsaunders.wordpress.com">Neil Saunders</a>. You can see it on <a href="https://github.com/lazappi/jtdwehi-twitter">Github</a>.</p>

<h1 id="introduction">Introduction</h1>

<p>An analysis of tweets from the Joining the Dots symposium. 1237 <br>
tweets were collected using the <code>rtweet</code> R package:</p>

<pre><code class="language-r">jtdwehi &lt;- search_tweets("#jtdwehi", 10000)  
saveRDS(jtdwehi, "data/jtdwehi.Rds")  
</code></pre>

<h2 id="searchallthehashtags">Search all the hashtags!</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/hashtags-1.png" alt=""></p>

<h1 id="timeline">Timeline</h1>

<h2 id="tweetsbyday">Tweets by day</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/tweets-by-day-1.png" alt=""></p>

<h2 id="tweetsbydayandtime">Tweets by day and time</h2>

<p>Filtered for dates July 21-26, Prague time. <br>
<img src="http://lazappi.id.au/content/images/2017/08/tweets-by-day-hour-1.png" alt=""></p>

<h1 id="users">Users</h1>

<h2 id="toptweeters">Top tweeters</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/tweets-top-users-1.png" alt=""></p>

<h2 id="sources">Sources</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/tweets-top-sources-1.png" alt=""></p>

<h1 id="networks">Networks</h1>

<h2 id="replies">Replies</h2>

<p>The "replies network", composed from users who reply directly to one another, <br>
coloured by page rank.</p>

<p>Better to view the original PNG file in the <code>data</code> directory.</p>

<p><img src="http://lazappi.id.au/content/images/2017/08/jtdwehi_replies.png" alt=""></p>

<h2 id="mentions">Mentions</h2>

<p>The "mentions network", where users mention other users in their tweets. <br>
Filtered for k-core >= 4 and coloured by modularity class.</p>

<p>Better to view the original PNG file in the <code>data</code> directory.</p>

<p><img src="http://lazappi.id.au/content/images/2017/08/jtdwehi_mentions-1.png" alt=""></p>

<h1 id="retweets">Retweets</h1>

<h2 id="retweetproportion">Retweet proportion</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/is-retweet-1.png" alt=""></p>

<h2 id="retweetcount">Retweet count</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/retweet-count-1.png" alt=""></p>

<h2 id="topretweets">Top retweets</h2>

<table>  
 <thead>
  <tr>
   <th style="text-align:left;"> screen_name </th>
   <th style="text-align:left;"> text </th>
   <th style="text-align:right;"> retweet_count </th>
  </tr>
 </thead>
<tbody>  
  <tr>
   <td style="text-align:left;"> _lazappi_ </td>
   <td style="text-align:left;"> Slides from my #jtdwehi talk today about building a clustering tree <a href="https://t.co/lwTztVstOC">https://t.co/lwTztVstOC</a> </td>
   <td style="text-align:right;"> 12 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> _lazappi_ </td>
   <td style="text-align:left;"> .@bestqualitycrab Visualising creative research (more creatively) #jtdwehi #sketchnotes <a href="https://t.co/DXhk1u22nf">https://t.co/DXhk1u22nf</a> </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> .@claresloggett's tips on where to start with data viz in Python #jtdwehi <a href="https://t.co/jN626uOAqd">https://t.co/jN626uOAqd</a> </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Visualising grant recipients: Davids most funded but Richards get more money #jtdwehi <a href="https://t.co/iPImbK4paf">https://t.co/iPImbK4paf</a> </td>
   <td style="text-align:right;"> 9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mikejonesmelb </td>
   <td style="text-align:left;"> Really valuable point from @KathyReid: sometimes #dataviz decisions affected by need to consider political priorities and buy-in #jtdwehi </td>
   <td style="text-align:right;"> 9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gravitron </td>
   <td style="text-align:left;"> @bestqualitycrab demoing dataviz: ask the tricky Q's not the obvious. Consider the felt not just the instrumental.… <a href="https://t.co/ca1zCn4oSO">https://t.co/ca1zCn4oSO</a> </td>
   <td style="text-align:right;"> 8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mikejonesmelb </td>
   <td style="text-align:left;"> More on the Transport Network Strategic Investment Tool (TraNSIT) here <a href="https://t.co/z5v827bfjd">https://t.co/z5v827bfjd</a> @Xavier_Ho #jtdwehi </td>
   <td style="text-align:right;"> 8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mikejonesmelb </td>
   <td style="text-align:left;"> To visualise data is to encode it; how can we decode it? So Isabelle created Tracey McTraceface <a href="https://t.co/4YoxS4T6OS">https://t.co/4YoxS4T6OS</a> #jtdwehi </td>
   <td style="text-align:right;"> 7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> oldmateo </td>
   <td style="text-align:left;"> :: &quot;Research publishing methods stuck in the Stone Age&quot; ::  Brendan Ansell on balancing completeness and salience i… <a href="https://t.co/7WVV2Ni31U">https://t.co/7WVV2Ni31U</a> </td>
   <td style="text-align:right;"> 7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gravitron </td>
   <td style="text-align:left;"> @bestqualitycrab leading a chorus of Slipping Away. Just your run of the mill dataviz conference. #JoiningTheDots… <a href="https://t.co/6oxUMXZfpm">https://t.co/6oxUMXZfpm</a> </td>
   <td style="text-align:right;"> 7 </td>
  </tr>
</tbody>  
</table>

<h1 id="favourites">Favourites</h1>

<h2 id="favouriteproportion">Favourite proportion</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/has-favorite-1.png" alt=""></p>

<h2 id="favouritecount">Favourite count</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/favorite-count-1.png" alt=""></p>

<h2 id="topfavourites">Top favourites</h2>

<table>  
 <thead>
  <tr>
   <th style="text-align:left;"> screen_name </th>
   <th style="text-align:left;"> text </th>
   <th style="text-align:right;"> favorite_count </th>
  </tr>
 </thead>
<tbody>  
  <tr>
   <td style="text-align:left;"> _lazappi_ </td>
   <td style="text-align:left;"> Slides from my #jtdwehi talk today about building a clustering tree <a href="https://t.co/lwTztVstOC">https://t.co/lwTztVstOC</a> </td>
   <td style="text-align:right;"> 19 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Xavier_Ho </td>
   <td style="text-align:left;"> People are flowing back in #jtdwehi <a href="https://t.co/t4aU8WXoX9">https://t.co/t4aU8WXoX9</a> </td>
   <td style="text-align:right;"> 16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> WEHI_research </td>
   <td style="text-align:left;"> Welcome to delegates attending today's symposium Joining the Dots: The Art and Science of Data Visualisation! #jtdwehi #dataviz </td>
   <td style="text-align:right;"> 16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Visualising grant recipients: Davids most funded but Richards get more money #jtdwehi <a href="https://t.co/iPImbK4paf">https://t.co/iPImbK4paf</a> </td>
   <td style="text-align:right;"> 12 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> _lazappi_ </td>
   <td style="text-align:left;"> .@bestqualitycrab Visualising creative research (more creatively) #jtdwehi #sketchnotes <a href="https://t.co/DXhk1u22nf">https://t.co/DXhk1u22nf</a> </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> robbie_bonelli </td>
   <td style="text-align:left;"> So inspired by the talk given by @bestqualitycrab on the problem of #genderequality and how #dataviz can help us! Thanks Deb! #jtdwehi </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> KathyReid </td>
   <td style="text-align:left;"> The incredible @bestqualitycrab keynoting #jtdwehi <a href="https://t.co/mLgKdVt4IX">https://t.co/mLgKdVt4IX</a> </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> .@claresloggett's tips on where to start with data viz in Python #jtdwehi <a href="https://t.co/jN626uOAqd">https://t.co/jN626uOAqd</a> </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> peterneish </td>
   <td style="text-align:left;"> Building a clustering tree <a href="https://t.co/KDgdRfBejZ">https://t.co/KDgdRfBejZ</a> #jtdwehi </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Representing Greek films via olive trees (which are are actually Markov chains) #jtdwehi <a href="https://t.co/SB2CG4oH8D">https://t.co/SB2CG4oH8D</a> </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
</tbody>  
</table>

<h1 id="quotes">Quotes</h1>

<h2 id="quoteproportion">Quote proportion</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/is-quote-1.png" alt=""></p>

<h2 id="quotecount">Quote count</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/quotes-count-1.png" alt=""></p>

<h2 id="topquotes">Top quotes</h2>

<table>  
 <thead>
  <tr>
   <th style="text-align:left;"> screen_name </th>
   <th style="text-align:left;"> text </th>
   <th style="text-align:right;"> quote_count </th>
  </tr>
 </thead>
<tbody>  
  <tr>
   <td style="text-align:left;"> peterneish </td>
   <td style="text-align:left;"> Would love to see some taxonomic data plotted like this. #jtdwehi <a href="https://t.co/EbBL872fum">https://t.co/EbBL872fum</a> </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Xavier_Ho </td>
   <td style="text-align:left;"> overlaying clusters: the datavis movie #jtdwehi <a href="https://t.co/KA5ovvvW6r">https://t.co/KA5ovvvW6r</a> </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> frostickle </td>
   <td style="text-align:left;"> Where can people go from here, to take advantage of things they've learnt at #jtdwehi? @ResPlat? @OKFNau?

#dataviz <a href="https://t.co/TM6ngns9RS">https://t.co/TM6ngns9RS</a> </td>
   <td style="text-align:right;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> rowlandm </td>
   <td style="text-align:left;"> The money shot from @_lazappi_ ! #jtdwehi <a href="https://t.co/nqynLrC7Vg">https://t.co/nqynLrC7Vg</a> </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Xavier_Ho </td>
   <td style="text-align:left;"> Slide here: <a href="https://t.co/o2E59HHoZE">https://t.co/o2E59HHoZE</a> #jtdwehi <a href="https://t.co/L98WV1tXgu">https://t.co/L98WV1tXgu</a> </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> karinv </td>
   <td style="text-align:left;"> Thanks to @FCTweedie and @rubin_af for a great day of #dataviz! #jtdwehi <a href="https://t.co/Hti5FQtMGz">https://t.co/Hti5FQtMGz</a> </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> rowlandm </td>
   <td style="text-align:left;"> LImited funding ... sounds like research! #jtdwehi <a href="https://t.co/gZwllFhtRe">https://t.co/gZwllFhtRe</a> </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> peterneish </td>
   <td style="text-align:left;"> Fascinating insights into the life sciences #jtdwehi <a href="https://t.co/LpRwfP00ns">https://t.co/LpRwfP00ns</a> </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> karinv </td>
   <td style="text-align:left;"> Adding the correct hashtag! (sorry folks) #jtdwehi <a href="https://t.co/PoGZe8k1k8">https://t.co/PoGZe8k1k8</a> </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> robbie_bonelli </td>
   <td style="text-align:left;"> Depressing and motivating! #jtdwehi <a href="https://t.co/YCGB1ibYkw">https://t.co/YCGB1ibYkw</a> </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
</tbody>  
</table>

<h1 id="media">Media</h1>

<h2 id="mediacount">Media count</h2>

<p><img src="http://lazappi.id.au/content/images/2017/08/has-media-1.png" alt=""></p>

<h2 id="topmedia">Top media</h2>

<table>  
 <thead>
  <tr>
   <th style="text-align:left;"> screen_name </th>
   <th style="text-align:left;"> text </th>
   <th style="text-align:right;"> favorite_count </th>
  </tr>
 </thead>
<tbody>  
  <tr>
   <td style="text-align:left;"> Xavier_Ho </td>
   <td style="text-align:left;"> People are flowing back in #jtdwehi <a href="https://t.co/t4aU8WXoX9">https://t.co/t4aU8WXoX9</a> </td>
   <td style="text-align:right;"> 16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Visualising grant recipients: Davids most funded but Richards get more money #jtdwehi <a href="https://t.co/iPImbK4paf">https://t.co/iPImbK4paf</a> </td>
   <td style="text-align:right;"> 12 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> _lazappi_ </td>
   <td style="text-align:left;"> .@bestqualitycrab Visualising creative research (more creatively) #jtdwehi #sketchnotes <a href="https://t.co/DXhk1u22nf">https://t.co/DXhk1u22nf</a> </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> KathyReid </td>
   <td style="text-align:left;"> The incredible @bestqualitycrab keynoting #jtdwehi <a href="https://t.co/mLgKdVt4IX">https://t.co/mLgKdVt4IX</a> </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> .@claresloggett's tips on where to start with data viz in Python #jtdwehi <a href="https://t.co/jN626uOAqd">https://t.co/jN626uOAqd</a> </td>
   <td style="text-align:right;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Representing Greek films via olive trees (which are are actually Markov chains) #jtdwehi <a href="https://t.co/SB2CG4oH8D">https://t.co/SB2CG4oH8D</a> </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> frostickle </td>
   <td style="text-align:left;"> Now @Xavier_Ho from the @CSIROnews is talking about Visualising the Australian Transport Network

#jtdwehi #dataviz <a href="https://t.co/DcvXYmD45F">https://t.co/DcvXYmD45F</a> </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Getting underway for #jtdwehi with acknowledgement of country from @WEHI_research's director <a href="https://t.co/oNcnu5wtd9">https://t.co/oNcnu5wtd9</a> </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> FCTweedie </td>
   <td style="text-align:left;"> Patriarchy looks like this! What happens when we can describe the shape of injustice #jtdwehi <a href="https://t.co/8A7EhnFmt5">https://t.co/8A7EhnFmt5</a> </td>
   <td style="text-align:right;"> 9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gravitron </td>
   <td style="text-align:left;"> Best URL of the day goes to @Isa_Kiko's <a href="https://t.co/kapY0Aeacy">https://t.co/kapY0Aeacy</a>  A great looking tool! #JoiningTheDots #jtdwehi <a href="https://t.co/gal2v1PUJY">https://t.co/gal2v1PUJY</a> </td>
   <td style="text-align:right;"> 7 </td>
  </tr>
</tbody>  
</table>

<h3 id="mostlikedmediaimage">Most liked media image</h3>

<p><img src="http://lazappi.id.au/content/images/2017/08/most_liked_media.jpg" alt=""></p>

<h1 id="tweettext">Tweet text</h1>

<p>The 100 words used 3 or more times.</p>

<p><img src="http://lazappi.id.au/content/images/2017/08/count-words-1.png" alt=""></p>]]></content:encoded></item><item><title><![CDATA[Building a clustering tree]]></title><description><![CDATA[A "clustering tree" is a visualisation for showing relationships between clusterings. Here is an example of how to make one for scRNA-seq data.]]></description><link>http://lazappi.id.au/building-a-clustering-tree/</link><guid isPermaLink="false">c7f2d82f-d815-40ca-86f2-6a347f742ac3</guid><category><![CDATA[R]]></category><category><![CDATA[scRNA-seq]]></category><category><![CDATA[clustering]]></category><category><![CDATA[tree]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Wed, 19 Jul 2017 05:15:29 GMT</pubDate><media:content url="http://lazappi.id.au/content/images/2017/07/plot-tree-1-1.png" medium="image"/><content:encoded><![CDATA[<img src="http://lazappi.id.au/content/images/2017/07/plot-tree-1-1.png" alt="Building a clustering tree"><p>For my PhD I am working on methods for analysing single-cell RNA-sequencing (scRNA-seq) data which measure the expression of genes in individual cells. One of the most common analyses done on this type of data is to cluster the cells, often in an attempt to find out what cell types are present in a sample.</p>

<p>In a recent seminar I showed some images of what I am calling a "clustering tree" (you can see the slides <a href="https://speakerdeck.com/lazappi/wehi-bioinformatics-seminar">here</a> if you are interested). This is a visualisation I came up with to show the relationship between clusterings as the number of clusters is increased. A few people asked how I had made it so here is a short example.</p>

<h2 id="setup">Setup  </h2>

<p>First we need to load the libraries we are going to use:</p>

<pre><code># Simulation
library(splatter)

# Clustering
library(Seurat) # Installed from https://github.com/satijalab/seurat

# Graphs
library(igraph)

# Plotting
library(ggraph)
library(viridis)

# Data manipulation
library(tidyverse)
</code></pre>

<p>For this example I am going to simulate some scRNA-seq data with eight different groups with different numbers of cells using <a href="https://bioconductor.org/packages/splatter"><code>Splatter</code></a>.</p>

<pre><code>sim &lt;- splatSimulateGroups(groupCells = c(100, 80, 60, 50, 30, 20, 20, 15),
                           seed = 10, verbose = FALSE)
</code></pre>

<p>Let's take a quick look at this to see if it is anything like we would <br>
expect:</p>

<pre><code>plotTSNE(sim, colour_by = "Group")
</code></pre>

<p><img src="http://lazappi.id.au/content/images/2017/07/sim-tSNE-1.png" alt="Building a clustering tree"></p>

<p>Here we can see the different groups, so there should be something for our clustering analysis to find.</p>

<p>The clustering pacakge we are going to use is <a href="http://satijalab.org/seurat/"><code>Seurat</code></a> which uses it's own object to store the data. Here is a small function I have written to convert from the <code>SCESet</code> object produced by <code>Splatter</code> to the <code>Seurat</code> object required by <code>Seurat</code>.</p>

<pre><code>SCESetToSeurat &lt;- function(sce) {
    if (!is(sce,'SCESet')) {
        stop("sce must be an SCESet object")
    }

    counts &lt;- scater::counts(sce)

    if (is.null(counts)) {
        stop("sce must contain counts to convert to Seurat")
    }

    seurat &lt;- new("seurat",
                  raw.data = counts,
                  is.expr = sce@lowerDetectionLimit,
                  data.info = Biobase::pData(sce),
                  cell.names = Biobase::sampleNames(sce))

    return(seurat)
}
</code></pre>

<p>Now to convert the dataset.</p>

<pre><code>seurat &lt;- SCESetToSeurat(sim)
</code></pre>

<h2 id="clustering">Clustering  </h2>

<p>We now have a dataset in the format required by <code>Seurat</code>. Before we do any clustering we need to run through some setup steps. I'm not going to explain what they are doing here, if you want to now the details refer to the <code>Seurat</code> <a href="http://satijalab.org/seurat/pbmc-tutorial.html">tutorials</a>.</p>

<pre><code>seurat &lt;- Setup(seurat, project = "Example", meta.data = seurat@data.info)
seurat &lt;- MeanVarPlot(seurat, fxn.x = expMean, fxn.y = logVarDivMean,
                      x.low.cutoff = 0.1, x.high.cutoff = 3,
                      y.cutoff = 0.5, do.contour = FALSE)

seurat &lt;- PCA(seurat, pc.genes = seurat@var.genes, do.print = FALSE)
</code></pre>

<p>Now we can do the clustering. The parameter we are interested in is the <code>resolution</code> parameter which controls how many clusters <code>Seurat</code> returns. I start by setting <code>resolution = 0</code>. This will create a cluster containing all cells that will serve as the root of our tree. We also ask <code>Seurat</code> to store some of the intermediate calculations so we don't have to do them again when we cluster with different resolutions:</p>

<pre><code>seurat &lt;- FindClusters(seurat, pc.use = 1:20, resolution = 0, algorithm = 3,
                       print.output = FALSE, save.SNN = TRUE)
</code></pre>

<p>We can now loop over a range of resolutions that we are interested in. I have only tried a few values here but if this was a real dataset you might want to try some more.</p>

<pre><code>for (res in c(0.3, 0.6, 0.9, 1.2)) {
   seurat &lt;- FindClusters(seurat, resolution = res, algorithm = 3,
                          print.output = FALSE) 
}
</code></pre>

<h2 id="getresults">Get results  </h2>

<p><code>Seurat</code> stores the cluster labels in the <code>data.info</code> slot in columns starting with <code>res.</code>. This is the only part we are interested in so let's extract just those columns.</p>

<pre><code>clusterings &lt;- seurat@data.info %&gt;% select(contains("res."))

head(clusterings)

##       res.0 res.0.3 res.0.6 res.0.9 res.1.2
## Cell1     0       1       0       0       0
## Cell2     0       1       0       0       0
## Cell3     0       1       0       0       0
## Cell4     0       1       0       0       0
## Cell5     0       1       0       0       0
## Cell6     0       1       0       0       0
</code></pre>

<p>We now know which cluster each cell was assigned to at each resolution but to build the tree we need some more information. This next function looks at two neighbouring resolutions and works out how many cells moved from a cluster in the lower resolution to each cluster in the higher resolution. These transitions are going to form the edges of our tree.</p>

<pre><code>getEdges &lt;- function(clusterings) {

    # Loop over the different resolutions
    transitions &lt;- lapply(1:(ncol(clusterings) - 1), function(i) {

        # Extract two neighbouring clusterings
        from.res &lt;- sort(colnames(clusterings))[i]
        to.res &lt;- sort(colnames(clusterings))[i + 1]

        # Get the cluster names
        from.clusters &lt;- sort(unique(clusterings[, from.res]))
        to.clusters &lt;- sort(unique(clusterings[, to.res]))

        # Get all possible combinations
        trans.df &lt;- expand.grid(FromClust = from.clusters,
                                ToClust = to.clusters)

        # Loop over the possible transitions
        trans &lt;- apply(trans.df, 1, function(x) {
            from.clust &lt;- x[1]
            to.clust &lt;- x[2]

            # Find the cells from those clusters
            is.from &lt;- clusterings[, from.res] == from.clust
            is.to &lt;- clusterings[, to.res] == to.clust

            # Count them up
            trans.count &lt;- sum(is.from &amp; is.to)

            # Get the sizes of the two clusters
            from.size &lt;- sum(is.from)
            to.size &lt;- sum(is.to)

            # Get the proportions of cells moving along this edge
            trans.prop.from &lt;- trans.count / from.size
            trans.prop.to &lt;- trans.count / to.size

            return(c(trans.count, trans.prop.from, trans.prop.to))
        })

        # Tidy up the results
        trans.df$FromRes &lt;- as.numeric(gsub("res.", "", from.res))
        trans.df$ToRes &lt;- as.numeric(gsub("res.", "", to.res))
        trans.df$TransCount &lt;- trans[1, ]
        trans.df$TransPropFrom &lt;- trans[2, ]
        trans.df$TransPropTo &lt;- trans[3, ]

        return(trans.df)
    })

    # Bind the results from the different resolutions together
    transitions &lt;- do.call("rbind", transitions)

    # Tidy everything up
    levs &lt;- sort(as.numeric(levels(transitions$ToClust)))
    transitions &lt;- transitions %&gt;%
        mutate(FromClust = factor(FromClust,
                                  levels = levs))  %&gt;%
        mutate(ToClust = factor(ToClust, levels = levs))

    return(transitions)
}

edges &lt;- getEdges(clusterings)
head(edges)

##   FromClust ToClust FromRes ToRes TransCount TransPropFrom TransPropTo
## 1         0       0     0.0   0.3        135     0.3600000           1
## 2         0       1     0.0   0.3        100     0.2666667           1
## 3         0       2     0.0   0.3         60     0.1600000           1
## 4         0       3     0.0   0.3         50     0.1333333           1
## 5         0       4     0.0   0.3         30     0.0800000           1
## 6         0       0     0.3   0.6          0     0.0000000           0
</code></pre>

<p>Some of these columns are pretty obvious but the last three could do with an explanation. <code>TransCount</code> is the number of cells that move along this edge. <code>TransPropFrom</code> is the proportion of the cells in the lower resolution cluster that have made this transition and <code>TransPropTo</code> is the proportion of cells in the higher resolution cluster that came from this edge.</p>

<p>Getting the information about the nodes of the tree is easier as these just represent the clusters. This function summarises the cluster information and converts it to long format.</p>

<pre><code>getNodes &lt;- function(clusterings) {
    nodes &lt;- clusterings %&gt;%
        gather(key = Res, value = Cluster) %&gt;%
        group_by(Res, Cluster) %&gt;%
        summarise(Size = n()) %&gt;%
        ungroup() %&gt;%
        mutate(Res = stringr::str_replace(Res, "res.", "")) %&gt;%
        mutate(Res = as.numeric(Res), Cluster = as.numeric(Cluster)) %&gt;%
        mutate(Node = paste0("R", Res, "C", Cluster)) %&gt;%
        select(Node, everything())
}

nodes &lt;- getNodes(clusterings)
head(nodes)

## # A tibble: 6 x 4
##     Node   Res Cluster  Size
##    &lt;chr&gt; &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;
## 1   R0C0   0.0       0   375
## 2 R0.3C0   0.3       0   135
## 3 R0.3C1   0.3       1   100
## 4 R0.3C2   0.3       2    60
## 5 R0.3C3   0.3       3    50
## 6 R0.3C4   0.3       4    30
</code></pre>

<p>Each node needs a unique ID which I have made by combining the resolution and cluster number. We also record the number of cells in each cluster.</p>

<p>Now we can build the graph we will use as the starting point for our plot. Some of the possible edges between clusters will have no cells travelling along them so we filter them out. We also remove edges that correspond to a small proportion (&lt; 2%) of cells in the higher resolution cluster.</p>

<pre><code>graph &lt;- edges %&gt;%
    # Remove edges without any cell...
    filter(TransCount &gt; 0) %&gt;%
    # ...or making up only a small proportion of the new cluster
    filter(TransPropTo &gt; 0.02) %&gt;%
    # Rename the nodes
    mutate(FromNode = paste0("R", FromRes, "C", FromClust)) %&gt;%
    mutate(ToNode = paste0("R", ToRes, "C", ToClust)) %&gt;%
    # Reorder columns
    select(FromNode, ToNode, everything()) %&gt;%
    # Build a graph using igraph
    graph_from_data_frame(vertices = nodes)

print(graph)

## IGRAPH b1b93c3 DN-- 23 23 -- 
## + attr: name (v/c), Res (v/n), Cluster (v/n), Size (v/n),
## | FromClust (e/c), ToClust (e/c), FromRes (e/n), ToRes (e/n),
## | TransCount (e/n), TransPropFrom (e/n), TransPropTo (e/n)
## + edges from b1b93c3 (vertex names):
##  [1] R0C0  -&gt;R0.3C0 R0C0  -&gt;R0.3C1 R0C0  -&gt;R0.3C2 R0C0  -&gt;R0.3C3
##  [5] R0C0  -&gt;R0.3C4 R0.3C1-&gt;R0.6C0 R0.3C0-&gt;R0.6C1 R0.3C4-&gt;R0.6C1
##  [9] R0.3C0-&gt;R0.6C2 R0.3C2-&gt;R0.6C3 R0.3C3-&gt;R0.6C4 R0.6C0-&gt;R0.9C0
## [13] R0.6C2-&gt;R0.9C1 R0.6C3-&gt;R0.9C2 R0.6C1-&gt;R0.9C3 R0.6C4-&gt;R0.9C4
## [17] R0.6C1-&gt;R0.9C5 R0.9C0-&gt;R1.2C0 R0.9C1-&gt;R1.2C1 R0.9C2-&gt;R1.2C2
## [21] R0.9C3-&gt;R1.2C3 R0.9C4-&gt;R1.2C4 R0.9C5-&gt;R1.2C5
</code></pre>

<h2 id="plotthetree">Plot the tree  </h2>

<p>The last step is to pass our graph to the <a href="https://github.com/thomasp85/ggraph"><code>ggraph</code></a> library for plotting.</p>

<pre><code># Plot our graph using the `tree` layout
ggraph(graph, layout = "tree") +
    # Plot the edges, colour is the number of cells and transparency is the
    # proportion contribution to the new cluster
    geom_edge_link(arrow = arrow(length = unit(1, 'mm')),
                   end_cap = circle(3.5, "mm"), edge_width = 1,
                   aes(colour = log(TransCount), alpha = TransPropTo)) +
    # Plot the nodes, size is the number of cells
    geom_node_point(aes(colour = factor(Res),
                        size = Size)) +
    geom_node_text(aes(label = Cluster), size = 3) +
    # Adjust the scales
    scale_size(range = c(4, 15)) +
    scale_edge_colour_gradientn(colours = viridis(100)) +
    # Add legend labels
    guides(size = guide_legend(title = "Cluster Size", title.position = "top"),
           colour = guide_legend(title = "Clustering Resolution",
                                 title.position = "top"),
           edge_colour = guide_edge_colorbar(title = "Cell Count (log)",
                                             title.position = "top"),
           edge_alpha = guide_legend(title = "Cluster Prop",
                                     title.position = "top", nrow = 2)) +
    # Remove the axes as they don't really mean anything
    theme_void() +
    theme(legend.position = "bottom")
</code></pre>

<p><img src="http://lazappi.id.au/content/images/2017/07/plot-tree-1.png" alt="Building a clustering tree"></p>

<p>And here is the result! We can see that see that <code>Seurat</code> finds three of the clusters easily and that these don't change as the resolution increases. A fourth group contains most of the cells and is sub-divided as we increase resolution. Interestingly at the lowest resolution there is a small cluster which is then absorbed into one of the other branches.</p>

<p>This tree is cleaner and has less branches than what we would be likely to see with a real dataset but the process to create it would be the same. I have used <code>Seurat</code> as the clustering method in this example but it should be easy to adapt the process to any other method that allows you to adjust the number of clusters. I have found this visualisation useful in my analysis particularly for looking at which clusters are very distinct and the relationships between different clusters and clusterings. </p>

<p>Good luck creating your own clustering trees!</p>

<h3 id="sessioninformation">Session information</h3>

<pre><code>devtools::session_info()

## Session info -------------------------------------------------------------

##  setting  value                       
##  version  R version 3.4.1 (2017-06-30)
##  system   x86_64, darwin15.6.0        
##  ui       RStudio (1.0.143)           
##  language (EN)                        
##  collate  en_AU.UTF-8                 
##  tz       Australia/Melbourne         
##  date     2017-07-19

## Packages -----------------------------------------------------------------

##  package        * version  date       source                           
##  AnnotationDbi    1.38.1   2017-06-01 Bioconductor                     
##  ape              4.1      2017-02-14 cran (@4.1)                      
##  assertthat       0.2.0    2017-04-11 CRAN (R 3.4.0)                   
##  backports        1.1.0    2017-05-22 CRAN (R 3.4.0)                   
##  base           * 3.4.1    2017-07-07 local                            
##  beeswarm         0.2.3    2016-04-25 CRAN (R 3.4.0)                   
##  bindr            0.1      2016-11-13 CRAN (R 3.4.0)                   
##  bindrcpp       * 0.2      2017-06-17 CRAN (R 3.4.0)                   
##  Biobase        * 2.36.2   2017-05-04 Bioconductor                     
##  BiocGenerics   * 0.22.0   2017-04-25 Bioconductor                     
##  BiocParallel     1.10.1   2017-05-03 Bioconductor                     
##  biomaRt          2.32.1   2017-06-09 Bioconductor                     
##  bit              1.1-12   2014-04-09 CRAN (R 3.4.0)                   
##  bit64            0.9-7    2017-05-08 CRAN (R 3.4.0)                   
##  bitops           1.0-6    2013-08-17 CRAN (R 3.4.0)                   
##  blob             1.1.0    2017-06-17 CRAN (R 3.4.0)                   
##  broom            0.4.2    2017-02-13 CRAN (R 3.4.0)                   
##  car              2.1-5    2017-07-04 cran (@2.1-5)                    
##  caret            6.0-76   2017-04-18 cran (@6.0-76)                   
##  caTools          1.17.1   2014-09-10 cran (@1.17.1)                   
##  cellranger       1.1.0    2016-07-27 CRAN (R 3.4.0)                   
##  checkmate        1.8.3    2017-07-03 CRAN (R 3.4.1)                   
##  class            7.3-14   2015-08-30 CRAN (R 3.4.1)                   
##  cluster          2.0.6    2017-03-10 CRAN (R 3.4.1)                   
##  codetools        0.2-15   2016-10-05 CRAN (R 3.4.1)                   
##  colorspace       1.3-2    2016-12-14 CRAN (R 3.4.0)                   
##  compiler         3.4.1    2017-07-07 local                            
##  cowplot        * 0.7.0    2016-10-28 cran (@0.7.0)                    
##  data.table       1.10.4   2017-02-01 CRAN (R 3.4.0)                   
##  datasets       * 3.4.1    2017-07-07 local                            
##  DBI              0.7      2017-06-18 CRAN (R 3.4.0)                   
##  DEoptimR         1.0-8    2016-11-19 cran (@1.0-8)                    
##  devtools         1.13.2   2017-06-02 CRAN (R 3.4.0)                   
##  digest           0.6.12   2017-01-27 CRAN (R 3.4.0)                   
##  diptest          0.75-7   2016-12-05 cran (@0.75-7)                   
##  dplyr          * 0.7.1    2017-06-22 CRAN (R 3.4.1)                   
##  edgeR            3.18.1   2017-05-06 Bioconductor                     
##  evaluate         0.10.1   2017-06-24 CRAN (R 3.4.1)                   
##  fastICA          1.2-1    2017-06-12 cran (@1.2-1)                    
##  flexmix          2.3-14   2017-04-28 cran (@2.3-14)                   
##  FNN              1.1      2013-07-31 cran (@1.1)                      
##  forcats          0.2.0    2017-01-23 CRAN (R 3.4.0)                   
##  foreach          1.4.3    2015-10-13 cran (@1.4.3)                    
##  foreign          0.8-69   2017-06-22 CRAN (R 3.4.1)                   
##  fpc              2.1-10   2015-08-14 cran (@2.1-10)                   
##  gdata            2.18.0   2017-06-06 cran (@2.18.0)                   
##  ggbeeswarm       0.5.3    2016-12-01 CRAN (R 3.4.0)                   
##  ggforce          0.1.1    2016-11-28 CRAN (R 3.4.0)                   
##  ggplot2        * 2.2.1    2016-12-30 CRAN (R 3.4.0)                   
##  ggraph         * 1.0.0    2017-02-24 CRAN (R 3.4.0)                   
##  ggrepel          0.6.5    2016-11-24 CRAN (R 3.4.0)                   
##  glue             1.1.1    2017-06-21 CRAN (R 3.4.1)                   
##  gplots           3.0.1    2016-03-30 cran (@3.0.1)                    
##  graphics       * 3.4.1    2017-07-07 local                            
##  grDevices      * 3.4.1    2017-07-07 local                            
##  grid             3.4.1    2017-07-07 local                            
##  gridExtra        2.2.1    2016-02-29 CRAN (R 3.4.0)                   
##  gtable           0.2.0    2016-02-26 CRAN (R 3.4.0)                   
##  gtools           3.5.0    2015-05-29 cran (@3.5.0)                    
##  haven            1.1.0    2017-07-09 CRAN (R 3.4.1)                   
##  hms              0.3      2016-11-22 CRAN (R 3.4.0)                   
##  htmltools        0.3.6    2017-04-28 CRAN (R 3.4.0)                   
##  httpuv           1.3.5    2017-07-04 CRAN (R 3.4.1)                   
##  httr             1.2.1    2016-07-03 CRAN (R 3.4.0)                   
##  igraph         * 1.1.1    2017-07-16 CRAN (R 3.4.1)                   
##  IRanges          2.10.2   2017-05-25 Bioconductor                     
##  irlba            2.2.1    2017-05-17 cran (@2.2.1)                    
##  iterators        1.0.8    2015-10-13 cran (@1.0.8)                    
##  jsonlite         1.5      2017-06-01 CRAN (R 3.4.0)                   
##  kernlab          0.9-25   2016-10-03 cran (@0.9-25)                   
##  KernSmooth       2.23-15  2015-06-29 CRAN (R 3.4.1)                   
##  knitr            1.16     2017-05-18 CRAN (R 3.4.1)                   
##  labeling         0.3      2014-08-23 CRAN (R 3.4.0)                   
##  lars             1.2      2013-04-24 cran (@1.2)                      
##  lattice          0.20-35  2017-03-25 CRAN (R 3.4.1)                   
##  lazyeval         0.2.0    2016-06-12 CRAN (R 3.4.0)                   
##  limma            3.32.3   2017-07-16 Bioconductor                     
##  lme4             1.1-13   2017-04-19 cran (@1.1-13)                   
##  locfit           1.5-9.1  2013-04-20 CRAN (R 3.4.0)                   
##  lubridate        1.6.0    2016-09-13 CRAN (R 3.4.0)                   
##  magrittr         1.5      2014-11-22 CRAN (R 3.4.0)                   
##  MASS             7.3-47   2017-02-26 CRAN (R 3.4.1)                   
##  Matrix           1.2-10   2017-05-03 CRAN (R 3.4.1)                   
##  MatrixModels     0.4-1    2015-08-22 cran (@0.4-1)                    
##  matrixStats      0.52.2   2017-04-14 CRAN (R 3.4.0)                   
##  mclust           5.3      2017-05-21 cran (@5.3)                      
##  memoise          1.1.0    2017-04-21 CRAN (R 3.4.0)                   
##  methods        * 3.4.1    2017-07-07 local                            
##  mgcv             1.8-17   2017-02-08 CRAN (R 3.4.1)                   
##  mime             0.5      2016-07-07 CRAN (R 3.4.0)                   
##  minqa            1.2.4    2014-10-09 cran (@1.2.4)                    
##  mixtools         1.1.0    2017-03-10 cran (@1.1.0)                    
##  mnormt           1.5-5    2016-10-15 CRAN (R 3.4.0)                   
##  ModelMetrics     1.1.0    2016-08-26 cran (@1.1.0)                    
##  modelr           0.1.0    2016-08-31 CRAN (R 3.4.0)                   
##  modeltools       0.2-21   2013-09-02 cran (@0.2-21)                   
##  munsell          0.4.3    2016-02-13 CRAN (R 3.4.0)                   
##  mvtnorm          1.0-6    2017-03-02 cran (@1.0-6)                    
##  nlme             3.1-131  2017-02-06 CRAN (R 3.4.1)                   
##  nloptr           1.0.4    2014-08-04 cran (@1.0.4)                    
##  nnet             7.3-12   2016-02-02 CRAN (R 3.4.1)                   
##  numDeriv         2016.8-1 2016-08-27 cran (@2016.8-)                  
##  parallel       * 3.4.1    2017-07-07 local                            
##  pbapply          1.3-3    2017-07-04 cran (@1.3-3)                    
##  pbkrtest         0.4-7    2017-03-15 cran (@0.4-7)                    
##  pkgconfig        2.0.1    2017-03-21 CRAN (R 3.4.0)                   
##  plyr             1.8.4    2016-06-08 CRAN (R 3.4.0)                   
##  prabclus         2.2-6    2015-01-14 cran (@2.2-6)                    
##  psych            1.7.5    2017-05-03 CRAN (R 3.4.1)                   
##  purrr          * 0.2.2.2  2017-05-11 CRAN (R 3.4.0)                   
##  quantreg         5.33     2017-04-18 cran (@5.33)                     
##  R6               2.2.2    2017-06-17 CRAN (R 3.4.0)                   
##  ranger           0.8.0    2017-06-20 cran (@0.8.0)                    
##  RColorBrewer     1.1-2    2014-12-07 CRAN (R 3.4.0)                   
##  Rcpp             0.12.12  2017-07-15 CRAN (R 3.4.1)                   
##  RCurl            1.95-4.8 2016-03-01 CRAN (R 3.4.0)                   
##  readr          * 1.1.1    2017-05-16 CRAN (R 3.4.0)                   
##  readxl           1.0.0    2017-04-18 CRAN (R 3.4.0)                   
##  reshape2         1.4.2    2016-10-22 CRAN (R 3.4.0)                   
##  rhdf5            2.20.0   2017-04-25 Bioconductor                     
##  rjson            0.2.15   2014-11-03 CRAN (R 3.4.0)                   
##  rlang            0.1.1    2017-05-18 CRAN (R 3.4.0)                   
##  rmarkdown        1.6      2017-06-15 CRAN (R 3.4.1)                   
##  robustbase       0.92-7   2016-12-09 cran (@0.92-7)                   
##  ROCR             1.0-7    2015-03-26 cran (@1.0-7)                    
##  rprojroot        1.2      2017-01-16 CRAN (R 3.4.0)                   
##  RSQLite          2.0      2017-06-19 CRAN (R 3.4.1)                   
##  Rtsne            0.13     2017-04-14 cran (@0.13)                     
##  rvest            0.3.2    2016-06-17 CRAN (R 3.4.0)                   
##  S4Vectors        0.14.3   2017-06-03 Bioconductor                     
##  scales           0.4.1    2016-11-09 CRAN (R 3.4.0)                   
##  scater         * 1.4.0    2017-04-25 Bioconductor                     
##  segmented        0.5-2.1  2017-06-14 cran (@0.5-2.1)                  
##  Seurat         * 1.4.0.16 2017-07-19 Github (satijalab/seurat@3bd092a)
##  shiny            1.0.3    2017-04-26 CRAN (R 3.4.0)                   
##  shinydashboard   0.6.1    2017-06-14 CRAN (R 3.4.0)                   
##  sn               1.5-0    2017-02-10 cran (@1.5-0)                    
##  SparseM          1.77     2017-04-23 cran (@1.77)                     
##  splatter       * 1.0.3    2017-05-27 Bioconductor                     
##  splines          3.4.1    2017-07-07 local                            
##  stats          * 3.4.1    2017-07-07 local                            
##  stats4           3.4.1    2017-07-07 local                            
##  stringi          1.1.5    2017-04-07 CRAN (R 3.4.0)                   
##  stringr          1.2.0    2017-02-18 CRAN (R 3.4.0)                   
##  survival         2.41-3   2017-04-04 CRAN (R 3.4.1)                   
##  tclust           1.2-7    2017-06-30 cran (@1.2-7)                    
##  tibble         * 1.3.3    2017-05-28 CRAN (R 3.4.0)                   
##  tidyr          * 0.6.3    2017-05-15 CRAN (R 3.4.0)                   
##  tidyverse      * 1.1.1    2017-01-27 CRAN (R 3.4.0)                   
##  tools            3.4.1    2017-07-07 local                            
##  trimcluster      0.1-2    2012-10-29 cran (@0.1-2)                    
##  tsne             0.1-3    2016-07-15 cran (@0.1-3)                    
##  tweenr           0.1.5    2016-10-10 CRAN (R 3.4.0)                   
##  tximport         1.4.0    2017-04-25 Bioconductor                     
##  udunits2         0.13     2016-11-17 CRAN (R 3.4.0)                   
##  units            0.4-5    2017-06-15 CRAN (R 3.4.0)                   
##  utils          * 3.4.1    2017-07-07 local                            
##  VGAM             1.0-3    2017-01-11 cran (@1.0-3)                    
##  vipor            0.4.5    2017-03-22 CRAN (R 3.4.0)                   
##  viridis        * 0.4.0    2017-03-27 CRAN (R 3.4.0)                   
##  viridisLite    * 0.2.0    2017-03-24 CRAN (R 3.4.0)                   
##  withr            1.0.2    2016-06-20 CRAN (R 3.4.0)                   
##  XML              3.98-1.9 2017-06-19 CRAN (R 3.4.1)                   
##  xml2             1.1.1    2017-01-24 CRAN (R 3.4.0)                   
##  xtable           1.8-2    2016-02-05 CRAN (R 3.4.0)                   
##  yaml             2.1.14   2016-11-12 CRAN (R 3.4.0)                   
##  zlibbioc         1.22.0   2017-04-25 Bioconductor
</code></pre>]]></content:encoded></item><item><title><![CDATA[PyCon AU 2016]]></title><description><![CDATA[My experience at PyCon AU 2016 and some thoughts about how it compares to a scientific conference. ]]></description><link>http://lazappi.id.au/pycon-au-2016/</link><guid isPermaLink="false">fdaf460b-b06f-4c27-b624-80614dd92c0a</guid><category><![CDATA[python]]></category><category><![CDATA[conference]]></category><category><![CDATA[thoughts]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Thu, 18 Aug 2016 08:54:43 GMT</pubDate><media:content url="http://lazappi.id.au/content/images/2016/08/logo-mel.png" medium="image"/><content:encoded><![CDATA[<img src="http://lazappi.id.au/content/images/2016/08/logo-mel.png" alt="PyCon AU 2016"><p>Over the weekend I attended <a href="https://2016.pycon-au.org/">PyCon Australia</a>. This was my first time at a purely tech conference and I couldn't help but compare it to my previous experiences at scientific conferences.</p>

<p><strong>DISCLAIMER:</strong> Like I said this was my first tech conference and my scientific conference experience is also fairly limited so some of the comments I make might be generalisations that don't always apply. </p>

<p>PyCon started with miniconfs on Friday and continued coding sprints on Monday and Tuesday. I didn't attend any of these so my experience was only of the main conference on Saturday and Sunday. Here are some of the highlights for me in terms of presentations:</p>

<ul>
<li>Andrew Lonsdale - <a href="https://www.youtube.com/watch?v=PCZS9wqBUuE">Python for science, side projects and stuff!</a>
<img src="https://nzaitq-bn1306.files.1drv.com/y3ma0f7WeowCuWRLJ4V-PUiXhN7bwwuIVQ8q9KC7HAXwEVXl3R89Q234OyAi2vi3Zq80zqlcqzL664IBImd7cqK-q-utjYvpAejKL_qI59a-gHM9ICupdtW4ARAxAi4UMuaPK3ufitkgugUlnTvaarKI6w3-r6Si3h67omqtpHkD4U?width=686&amp;height=1024&amp;cropmode=none" alt="PyCon AU 2016"></li>
<li>Alexander Hogue - <a href="https://www.youtube.com/watch?v=MkSkqMvGBuo">Graphing when your Facebook friends are awake</a> - Story of discovering a hidden Facebook API and using it to track when your friends are online. Thoroughly entertaining while still providing the technical details.</li>
<li>Rachel Bunder - <a href="https://www.youtube.com/watch?v=cy5n6XAtA-w">I wish I learnt that earlier!</a> - Description of some of the slightly more advanced features available in Python. Could be a great intro for someone new to Python.</li>
<li>Russell Keith-Magee - <a href="https://www.youtube.com/watch?v=1sDyVJm3Ht0">Python All the Things</a>
<img src="https://ogk2pw-bn1306.files.1drv.com/y3mxaWYlGAnh7WFQOv00lBuARtRgDmEeFuxvyLyawQofwC4n0ytlMXaRIA1BoqX_EN-NuCoIGQrIUf9OKnbdUYuvyNQRR1V5v4vxG-KANcJpF1fKqiu4WDOugUx4DJOb9qhKfknHQHemb5nhGeNqRU0Ca8-f7Pkos5TjRMDarRtbc8?width=1024&amp;height=693&amp;cropmode=none" alt="PyCon AU 2016"></li>
<li>Sebastian Vetter - <a href="https://www.youtube.com/watch?v=bsJFMtQ5MZU">Click: A Pleasure To Write, A Pleasure To Use</a> - Click is an argument parsing library with additional features beyond argparse. Also apparently becoming the standard at Facebook (didn't learn that at the conference, but it's a fun fact).</li>
<li>Justin Warren - <a href="https://www.youtube.com/watch?v=qjTc5q7MsMg">Predicting the TripleJ Hottest 100 With Python</a> - Overview of predicting the Hottest 100 for the last few years, starting with the method used by the <a href="http://warmest100.com.au/2013/index.html">Warmest 100</a> and continuing on how to extract and process information from Instagram.</li>
<li>Jackson Fairchild - <a href="https://www.youtube.com/watch?v=Rdc06jpjVIY">Hitting the Wall and How to Get Up Again - Tackling Burnout and Strategies for Self Care</a>
<img src="https://1eskhq-bn1306.files.1drv.com/y3mheWL6ugl8rmqtz2koUvl-f10EvyMN6cMC9qesbW76eDNqiatpcuWYh4DsgVljwt-4g_m5FGAnca4ha0Iw9XsGHehfaYtVAM9FgTyoAAnLqx8k6i6gIyCV8uH-ioQUy96zYYIu51E3-2yQIyNHTBpjBf979FddTpEDy4kPtnGCps?width=1024&amp;height=701&amp;cropmode=none" alt="PyCon AU 2016"></li>
</ul>

<p>(Full schedules for <a href="https://2016.pycon-au.org/programme/schedule/saturday?_code=301">Saturday</a> and <a href="https://2016.pycon-au.org/programme/schedule/sunday?_code=301">Sunday</a> and links to videos are available on the PyCon website) </p>

<p>Overall I was really impressed by the quality of the talks. There were a couple that I thought could be improved a bit or where I wasn't that interested in the content but there were no flat-out bad talks like you often see in the scientific context. It was clear that the presenters had put a lot of effort into planning what they were going to say and how to make that interesting and engaging for an audience that might be new to the topic. I don't think I saw any slides that were walls of text or full of multiple plots. On the other had there was lots of code in slides, including live snippets. I'm not usually a fan of this but in the context it makes sense, particularly as you can assume that everyone has a basic grasp of Python. There were also lots of live demos, some of which were pretty impressive, and I don't think I saw any fail. </p>

<p>What struck me as being the biggest differences at PyCon compared to a scientific conference was the sense of a community and awareness of wider social issues. There was a big effort to be inclusive to all genders, sexualities, ethnic groups etc. and several of the talks touched on ethical issues or the speaker's own experience in the community. While I would hope that a scientific gathering wouldn't be discriminatory I can't see diversity being embraced in the same way, but hopefully that will continue to improve. There was a sense of everyone being in it together and it was common for speakers to praise work that they hadn't been involved in, but thought was interesting or useful. I didn't see anyone described using their titles and it seemed that someone who had learned Python in the last year was as valued as someone who had been a major contributor for the last 10 years (although there may had been power dynamics that I wasn't aware of). </p>

<p>I think that a lot of the differences come from the work/volunteer divide. While PyCon was an opportunity to network or advertise your work the focus seemed to be on contributing to the community and the speakers were enthusiastic and keen to present. In contrast a scientific conference is a professional opportunity. As a scientist you are judged on your ability to get a talk which means more competition and sometimes speakers who aren't interested in presenting. Every talk is a demonstration of your worth which makes it hard to present unfinished work and encourages people to try and fit to much in. It would be great for scientific conferences to spend more time discussing issues around the communities they represent but to do so they might have to sacrifice opportunities. For example it would be great to see a talk about mental health issues like Jackson Fairchild's but that would mean taking away a spot from someone that might need it to progress their career. Personally I think we could maybe do with less talks from whichever well known person is doing the rounds in favour of some outside experts. </p>

<p>Overall I enjoyed my time at PyCon. It was a bit different to a scientific conference and I think there are probably things they can learn from each other. Congratulations to all the speakers and everyone involved in organising. Given that it is in Melbourne again I hope to be back next year. </p>]]></content:encoded></item><item><title><![CDATA[Gantt charts in R]]></title><description><![CDATA[Producing a Gantt chart in R from a CSV or XLSX file using DiagrammeR.]]></description><link>http://lazappi.id.au/gantt-charts-in-r/</link><guid isPermaLink="false">253c3f9e-ea4b-49c6-9315-393159bb0850</guid><category><![CDATA[R]]></category><category><![CDATA[project management]]></category><category><![CDATA[gantt chart]]></category><category><![CDATA[Excel]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Mon, 13 Jun 2016 06:34:35 GMT</pubDate><content:encoded><![CDATA[<p>Gantt charts are a project management tool designed to visualise the tasks in a project, how long they will take and what order they must be completed. If you haven't seen one before essentially they look like a modified horizontal bar chart. Along the horizontal axis is time with tasks along the vertical. Each task consists of a bar where the ends are the start and end times. Often there are also arrows indicating dependencies and a line showing the current date.</p>

<p>As part of the proposal for my PhD project I wanted to include a Gantt chart, both as a way of showing what I planned to do and as a way of keeping track of my progress. I expected there to be a simple template for Excel or Google Sheets but there wasn't much and they didn't quite fit what I wanted. Looking elsewhere didn't turn up much either. What I wanted was a tool where I could enter tasks and dates in text format and produce a relatively attractive chart that I could easily update. In the end I turned to faithful old R, which had the added advantage that I could easily incorporate the chart into <a href="http://rmarkdown.rstudio.com/">R Markdown</a> documents.</p>

<p>There are a couple of packages that can make Gantt charts in R including <a href="https://cran.r-project.org/web/packages/plotrix/index.html">plotrix</a> and <a href="https://cran.r-project.org/web/packages/plan/index.html">plan</a> but in the end I went with <a href="https://rich-iannone.github.io/DiagrammeR/">DiagrammeR</a>. The Gantt functionality of DiagrammeR depends on <a href="https://knsv.github.io/mermaid/">Mermaid</a> which has a simple, almost markdown-like syntax.</p>

<pre><code>gantt  
dateFormat  YYYY-MM-DD  
title My Gantt chart

section First section  
Task 1            :done,    des1, 2014-01-06, 2014-01-08  
Task 2            :active,  des2, 2014-01-09, 3d  
Task 3            :         des3, after des2, 5d  
Task 4            :         des4, after des3, 5d  
</code></pre>

<p>Basically each task is written as:</p>

<pre><code>Task name         :status, label, start_date, end_date  
</code></pre>

<p>Where the start and end dates can also include durations or references to other tasks.</p>

<p>While this format is easy to use I prefer to use a standard delimited format which is easier to edit and read into R. To this end I created some functions which will take a CSV or XLSX file and produce a Gantt chart.</p>

<pre><code class="language-{r}">library("magrittr")

# Take a data.frame containing tasks and build a Mermaid string
tasks2string &lt;- function(tasks) {

    tasks.list &lt;- split(tasks,
                        factor(tasks$Section, levels = unique(tasks$Section)))

    strings &lt;- sapply(names(tasks.list),
                      function(section) {
                          tasks.list[[section]] %&gt;%
                              dplyr::select(-Section) %&gt;%
                              tidyr::unite(Part1, Task, Priority,
                                           sep = ": ") %&gt;%
                              tidyr::unite(String, Part1, Status, Name, Start,
                                           End, sep = ", ") %&gt;%
                              magrittr::use_series("String") %&gt;%
                              paste(collapse = "\n") %&gt;%
                              gsub(" ,", "", .) # Remove empty columns
                          }
                      )

    string &lt;- ""

    for(section in names(strings)) {
        string &lt;- paste0(string, "\n",
                         "section ", section, "\n",
                         strings[section],
                         "\n")
    }

    return(string)
}

# Produce a Gantt chart from data.frame of tasks
# Adds the Mermaid header to the tasks string
buildGantt &lt;- function(tasks) {

    gantt.string &lt;- paste0("gantt", "\n",
                           "dateformat YYYY-MM-DD", "\n",
                           "title My Gantt Chart",
                           "\n")   

    gantt.string &lt;- paste0(gantt.string, tasks2string(tasks))

    gantt &lt;- DiagrammeR::mermaid(gantt.string)

    gantt$x$config = list(ganttConfig = list(
        # Make sure the axis labels are formatted correctly
        axisFormatter = list(list(
            "%m-%y", # New data format
            htmlwidgets::JS('function(d){ return d}') # Select dates to format
        ))
    ))

    return(gantt)
}

# Read a file and return a Gantt chart
buildGanttFromFile &lt;- function(tasks.file, format = c("csv", "xlsx")) {

    format &lt;- match.arg(format)

    switch(format,
           csv = {
               tasks &lt;- read.csv(tasks.file, stringsAsFactors = FALSE)
           },
           xlsx = {
               tasks &lt;- gdata::read.xls(tasks.file)
           })

    return(buildGantt(tasks))
}
</code></pre>

<p>I can now construct my tasks by editing a CSV file and produce a Gantt chart directly from that by calling a single function. You may wonder why I have included XLSX as an input option? Surely using Excel is against the principles of data science? Firstly I'm not that opposed to Excel (when it is used correctly) but the reason in this case it is to get around one of the limitations of DiagrammeR. The Mermaid syntax allows you to define a task as starting after another task but you can't say that a task ends before another. There are often situations where you have a hard end deadline (such as a PhD committee meeting) and you need to work backwards from that. By using Excel I can use simple formulas to calculate the dates which are then passed to R. I could do this programmatically in R (and I might at some stage) but Excel was a quicker solution that let me get on to writing. </p>]]></content:encoded></item><item><title><![CDATA[Bioconductor 3.3 packages]]></title><description><![CDATA[Interesting new packages in Bioconductor 3.3.]]></description><link>http://lazappi.id.au/bioconductor-3-3-packages/</link><guid isPermaLink="false">c722c3c2-8c66-4736-b91b-95cb6f32554b</guid><category><![CDATA[R]]></category><category><![CDATA[bioconductor]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Thu, 05 May 2016 00:05:27 GMT</pubDate><content:encoded><![CDATA[<p>Bioconductor 3.3 has just been released. You can find the complete list of new packages (and changes to existing packages) <a href="https://bioconductor.org/news/bioc_3_3_release/">here</a> but here are a few I thought might be interesting based on the description. I might have more to say once I've had time to try a few out.</p>

<ul>
<li><strong>debrowser</strong> – Interactive plots and tables for differential expression</li>
<li><strong>DEFormats</strong> – convert between differential expression formats</li>
<li><strong>EBSEA</strong> – exon based differential expression</li>
<li><strong>EmpiricalBrownsMethod</strong> – combining dependent p-values</li>
<li><strong>Linnorm</strong> – normalisation for parametric tests, simulation of RNA-seq data</li>
<li><strong>multiClust</strong> – feature selection and clustering analysis for transcriptomic data</li>
<li><strong>RGraph2js</strong> – interactive network visualisations with D3</li>
<li><strong>tximport</strong> – import and summarise transcript-level estimates</li>
</ul>

<h2 id="singlecell">Single-cell</h2>

<p>These packages are specific to single-cell RNA-seq analysis. A couple of them I am already familiar with, particularly <strong>scater</strong>.</p>

<ul>
<li><strong>cellity</strong> - identifying low-quality cells</li>
<li><strong>celTree</strong> - model the relationship between individual cells over time or space.</li>
<li><strong>scater</strong> - tools for analysis of single-cell RNA-seq data (particularly QC)</li>
<li><strong>scde</strong> - single-cell differential expression</li>
<li><strong>scran</strong> - normalisation, cell-cycle assignment, gene detection</li>
</ul>]]></content:encoded></item><item><title><![CDATA[Extracting alignment statistics using Python]]></title><description><![CDATA[<p>Recently <a href="http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0888-1">this paper</a> by Ilicic et al. suggested a method for assessing the quality of individual cells in a single-cell RNA-seq experiment. The basic idea is to extract various biological and technical features from each the reads for each cell, then use PCA with outlier detection or a SVM to</p>]]></description><link>http://lazappi.id.au/extracting-alignment-statistics-using-python/</link><guid isPermaLink="false">48b65c02-d805-4be1-8e16-b104d156472d</guid><category><![CDATA[python]]></category><category><![CDATA[alignment]]></category><category><![CDATA[statistics]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Wed, 30 Mar 2016 04:09:40 GMT</pubDate><content:encoded><![CDATA[<p>Recently <a href="http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0888-1">this paper</a> by Ilicic et al. suggested a method for assessing the quality of individual cells in a single-cell RNA-seq experiment. The basic idea is to extract various biological and technical features from each the reads for each cell, then use PCA with outlier detection or a SVM to classify cells as "high" or "low" quality. There are two pieces of software associated with the paper: <code>cellity</code>, an R package that performs the classification and <code>Celloline</code>, a Python script that performs alignment, summarisation and extraction of alignment statistics such as the number of reads aligned to exons, introns, intergenic regions etc. I was interested in using <code>cellity</code> but I didn't want to change my whole workflow to use the <code>Celloline</code> pipeline, so instead I decided to take the part responsible for extracting alignment statistics (available <a href="https://github.com/Teichlab/celloline/blob/master/lib/stats.py">here</a>) and convert it to a stand-alone Python script. </p>

<p>The core processing remains the same (except I have removed read counting which I do with <code>featureCounts</code>), but I have added a few features:</p>

<ol>
<li>Multiple files - paths to multiple alignment files can now be provided as arguments on the command line.  </li>
<li>BAM files - the script can now handle BAM files as well as SAM using <a href="https://github.com/pysam-developers/pysam">pysam</a>. It will work if the BAM is unsorted, but the output can be slightly different.  </li>
<li>Index - reading the GTF annotation file can take a significant amount of time, particularly for a single-cell experiment where there are a large number of files with relatively few reads. To limit this overhead the object holding the annotation can be pickled to disk for future use.  </li>
<li>Parallel - multiple files can now be processed in parallel using <a href="https://pythonhosted.org/joblib/">joblib</a>. This is fairly crude but it is a significant improvment, particularly when combined with a pickled index.  </li>
<li>Argument handling - now performed by <a href="https://docs.python.org/3/library/argparse.html">argparse</a>, complete with handy help message.  </li>
<li>Logging - progress and error messages. </li>
</ol>

<p>Putting it all together I can now extract alignment statistics from multiple BAM files in parallel with a single command:</p>

<pre><code>alignStats -o stats.csv -g annotation.gtf -i annotation.index -t bam -p 10 *.bam  
</code></pre>

<p>The script is available on <a href="https://github.com/lazappi/binf-scripts/blob/master/alignStats.py">Github</a>.</p>]]></content:encoded></item><item><title><![CDATA[My Markdown thesis]]></title><description><![CDATA[<p>It's come to the stage in my Master's where I have to start thinking about writing my thesis. Apart from all the analysis I have to do before I can do that there is also the question of what I am going to use to construct the document itself.</p>

<p>For</p>]]></description><link>http://lazappi.id.au/my-markdown-thesis/</link><guid isPermaLink="false">669f202f-daf9-43b3-bfcb-0f9b371fc3af</guid><category><![CDATA[writing]]></category><category><![CDATA[markdown]]></category><category><![CDATA[thesis]]></category><category><![CDATA[latex]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Thu, 13 Aug 2015 11:05:00 GMT</pubDate><content:encoded><![CDATA[<p>It's come to the stage in my Master's where I have to start thinking about writing my thesis. Apart from all the analysis I have to do before I can do that there is also the question of what I am going to use to construct the document itself.</p>

<p>For the last year or so I have been writing using Markdown which is converted to Tex using <a href="http://lazappi.id.au/my-markdown-thesis/pandoc.org">Pandoc</a> then used to produce a PDF. I have found this a really good way to work combining the speed and clarity of Markdown with the ability to include LaTeX directly when I need extra flexibility. I have been using the <a href="https://sbrosinski.github.io/uberdoc/">Uberdoc</a> tool to set up projects and combine multiple Markdown files but unfortunately it's not quite flexible enough for a complex document like a thesis.</p>

<p>I wanted to be able to be able to incorporate my Tex, particularly so I could use  John Papandriopoulos' <a href="http://jpap.org/projects.html">thesis template</a>. Ideally I wanted to build my own tool (probably in Python or Perl) that would manage projects, including git commits, as well as produce statistics but time doesn't permit so I have ended up with a Make based solution.</p>

<p>The setup allows me to be flexible with how I set up my directory as the whole project is searched for Markdown files which are converted to LaTeX in a build directory. The directory structure is flattened at this stage which means I don't have to write the full path when including files. Figures are treated similarly and there are folders for additional LaTeX files (such as styles and templates) and bibliography files. I also have a core Tex file which is used to tie everything together. The PDF is constructed using <a href="https://www.ctan.org/pkg/latexmk/?lang=en">latexmk</a> and I can use <a href="http://app.uio.no/ifi/texcount/">texcount</a> for keeping track of my word count. So when I run <code>make</code> for the first time the following steps occur:</p>

<ol>
<li>The build directory is created with the necessary subdirectories.  </li>
<li>The project directory is searched for Markdown files which are converted <br>
to TeX files in the build directory.</li>
<li>TeX files are copied from the template directory to the build directory.  </li>
<li>All files are copied from the style directory to a style subdirectory           inside the build directory.  </li>
<li>All files are copied from the bibliography directory to a bibliography <br>
subdirectory inside the build directory.</li>
<li>The figures directory is searched for image files which are copied to a <br>
figures subdirectory inside the build directory.</li>
<li><code>latexmk</code> is used to build the output file in the build directory.  </li>
<li>The output PDF is copied to the main directory.</li>
</ol>

<p>It's not perfect, for example there is a bug that means <code>make</code> needs to be run more than once when you add a new file which isn't ideal, but it mostly does what I want and hopefully it will get me through. If you want to check it out the code is available on <a href="https://github.com/lazappi/thesis-template">Github</a>.</p>]]></content:encoded></item><item><title><![CDATA[Open science workshops]]></title><description><![CDATA[<p>This Saturday I attended the first workshop run by <a href="https://openscienceworkshops.github.io/">Open Science Workshops</a> at the <a href="http://inspire9.com/">Inspire9</a> collaborative workspace. Open Source Workshops is a new initiative aiming to promote open source tools and techniques to the scientific community. </p>

<p>The workshop consisted of two main parts: an introduction to the basics of Github</p>]]></description><link>http://lazappi.id.au/open-science-workshops/</link><guid isPermaLink="false">52ff3831-b808-4537-b9b3-cfa8603d0cc5</guid><category><![CDATA[open science]]></category><category><![CDATA[workshop]]></category><category><![CDATA[collaboration]]></category><dc:creator><![CDATA[Luke Zappia]]></dc:creator><pubDate>Sun, 20 Jul 2014 09:20:00 GMT</pubDate><content:encoded><![CDATA[<p>This Saturday I attended the first workshop run by <a href="https://openscienceworkshops.github.io/">Open Science Workshops</a> at the <a href="http://inspire9.com/">Inspire9</a> collaborative workspace. Open Source Workshops is a new initiative aiming to promote open source tools and techniques to the scientific community. </p>

<p>The workshop consisted of two main parts: an introduction to the basics of Github (creating repositories, commiting, forking, merging...) and the SageMathCloud; and a serious of talks:</p>

<ul>
<li>General discussion of how and why you should take an open approach to scientific research (Alex Ghitza, Pure Maths Lecturer, University of Melbourne). </li>
<li><a href="https://www.authorea.com/">Authorea</a> - An online platform for collaborative manuscript editing designed for digital publishing that can combines Latex, Markdown etc. with interactive visualisations, embedded IPython notebooks and built in citation management (Andrea Bedini, Maths and Stats, University of Melbourne). </li>
<li><a href="https://scirate.com/">SciRate</a> - A social media approach to rating and sharing the papers available in at arXiv.org (Jaiden Mispy).</li>
<li><a href="https://cloud.sagemath.com/">SageMathCloud</a> - Collaborative cloud platform set up with particular support for IPython and Latex as well as a terminal. Kind of like Google docs meets a VM. </li>
<li><a href="https://nectar.org.au/">NeCTAR</a> - Cloud facility available to Australian researchers. Also the Genomics Virtual Laboratory set up that allows quick launching of a VM with Galaxy and other bioinformatics tools as well as IPython and RStudio (Clare Sloggett, VLSCI).</li>
<li><a href="http://software-carpentry.org/">Software Capentry</a> - Bootcamps for training in scientific computing includiang Git (work tracking), UNIX (automation), Programming (modularisation) and SQL (structured data) (Scott Ritchie). </li>
<li><a href="http://elifesciences.org/">eLIFE</a> - A new open access life sciences journal in the UK that uses a consulative peer review process as well as their eLIFE Lens and oline system for viewing their papers or anything in PubMed via the <a href="http://oa-sandbox.org/">OA Sandbox</a> (Ian Mulvany). </li>
</ul>

<p>A common theme running through the talks (apart from open access) was the need to move towards 21st century tools and processes, both for collaboration and publishing. </p>

<p>Overall it was a worthwhile experience and hopefully there will be more in the future, if you are interested in the details the <a href="https://github.com/silky/osw-material/tree/master/workshop-melb-2014">talks</a> and <a href="https://github.com/OpenScienceWorkshops/osw-material/wiki/Summary-of-the-July-2014-Melbourne-Open-Science-Workshop">agenda</a> are available here on Github. </p>]]></content:encoded></item></channel></rss>