1 Introduction and installation

Graph Inference of Population Heterogeneity (griph) is an R package for the analysis of single cell RNA-sequencing data. It can be used to automatically identify different cell types or states, even in the presence of confounding sources of variance such as cell cycle stages or batch effects.

griph is currently available through https://github.com. It can be installed using:

library(devtools)
install_git("git://github.com/ppapasaikas/griph.git", subdir = "griph")

In order for griph to work, some additional packages may have to be installed. You’ll need all the ones under “Needed”. You can do without the ones under “Optional”, as they are only used in the vignette:

## Needed :
##   Matrix, methods, igraph, QUIC, coop, corpcor, foreach, doParallel, parallel, bigmemory, RColorBrewer, Rcpp, RcppProgress, RcppArmadillo, Matrix 
## Optional :
##   BiocStyle, knitr, rmarkdown, testthat, Rtsne, gtools

2 Quickstart: A sample analysis

We will use human induced pluripotency cells from Tung et al. for this example (see details in the “Sample datasets” section). The dataset is included in the package, together with the iPS cell line labels:

M <- readRDS(system.file("extdata", "tung_UMIs_top10k.rds", package = "griph"))
dim(M) # genes by cells
## [1] 10000   864
trueLabel <- attr(M, "label2")
table(trueLabel)
## trueLabel
## NA19098 NA19101 NA19239 
##     288     288     288

Cell types can be identified using griph_cluster:

library(griph)
res <- griph_cluster(M, ClassAssignment = trueLabel, use.par = FALSE,
                     plot = TRUE, fsuffix = 'tung')
table(res$MEMB)
## 
##   1   2   3   4   5   6 
## 102  99  94 282  98 189

The automatic generation of a 2D-embedding plot (plot argument) is switched on here and will create a png (default) or pdf (image.format and fsuffix arguments) file in the current working directory:

list.files("./", pattern = "tung")
## [1] "graph_tung.png" "Lvis_tung.png"

The estimated cell graph (as an igraph object) and the coordinages of the 2D-embedding (when using plot=TRUE) are returned by griph_cluster:

summary(res$GRAO) # full graph
## IGRAPH a1af978 U-W- 864 41672 -- 
## + attr: class (v/c), membership (v/n), community.size (v/n), labels
## | (v/c), weight (e/n)
head(res$plotLVis) # 2D embedding
##                         x         y
## NA19098.r1.A01 -12.422459 -12.15378
## NA19098.r1.A02 -13.228306 -11.99606
## NA19098.r1.A03  -9.097008 -12.05729
## NA19098.r1.A04 -11.683206 -15.24782
## NA19098.r1.A05 -13.078734 -12.74092
## NA19098.r1.A06 -12.846079 -14.07115

3 Visualizing griph results

In the graph returned by griph_cluster, cells are represented by graph vertices. All cells are contained in the full graph (res$GRAO), and a representatitve subset of cells in the simplified graph (res$plotGRAO), which will be generated by plotGraph and used for plotting. The graph can be easily plotted again, for example to separately illustrate known and predicted cell types:

g.true <- plotGraph(res, fill.type = "true", line.type = "none")