griph identifies cell types in single cell RNA-seq datasets, allowing to control for unwanted sources of variance (e.g. cell cycle) that may confound the cell types. The most important function that combines all the steps of cell type identification from griph is griph_cluster. The vignette illustrates its use applied to several publicly available datasets.
griph 0.1.1
Graph Inference of Population Heterogeneity (griph) is an R package for the analysis of single cell RNA-sequencing data. It can be used to automatically identify different cell types or states, even in the presence of confounding sources of variance such as cell cycle stages or batch effects.
griph is currently available through https://github.com. It can be installed using:
library(devtools)
install_git("git://github.com/ppapasaikas/griph.git", subdir = "griph")
In order for griph to work, some additional packages may have to be installed. You’ll need all the ones under “Needed”. You can do without the ones under “Optional”, as they are only used in the vignette:
## Needed :
## Matrix, methods, igraph, QUIC, coop, corpcor, foreach, doParallel, parallel, bigmemory, RColorBrewer, Rcpp, RcppProgress, RcppArmadillo, Matrix
## Optional :
## BiocStyle, knitr, rmarkdown, testthat, Rtsne, gtools
We will use human induced pluripotency cells from Tung et al. for this example (see details in the “Sample datasets” section). The dataset is included in the package, together with the iPS cell line labels:
M <- readRDS(system.file("extdata", "tung_UMIs_top10k.rds", package = "griph"))
dim(M) # genes by cells
## [1] 10000 864
trueLabel <- attr(M, "label2")
table(trueLabel)
## trueLabel
## NA19098 NA19101 NA19239
## 288 288 288
Cell types can be identified using griph_cluster:
library(griph)
res <- griph_cluster(M, ClassAssignment = trueLabel, use.par = FALSE,
plot = TRUE, fsuffix = 'tung')
table(res$MEMB)
##
## 1 2 3 4 5 6
## 102 99 94 282 98 189
The automatic generation of a 2D-embedding plot (plot
argument) is switched on here and will create a png (default) or pdf (image.format
and fsuffix
arguments) file in the current working directory:
list.files("./", pattern = "tung")
## [1] "graph_tung.png" "Lvis_tung.png"
The estimated cell graph (as an igraph object) and the coordinages of the 2D-embedding (when using plot=TRUE
) are returned by griph_cluster:
summary(res$GRAO) # full graph
## IGRAPH a1af978 U-W- 864 41672 --
## + attr: class (v/c), membership (v/n), community.size (v/n), labels
## | (v/c), weight (e/n)
head(res$plotLVis) # 2D embedding
## x y
## NA19098.r1.A01 -12.422459 -12.15378
## NA19098.r1.A02 -13.228306 -11.99606
## NA19098.r1.A03 -9.097008 -12.05729
## NA19098.r1.A04 -11.683206 -15.24782
## NA19098.r1.A05 -13.078734 -12.74092
## NA19098.r1.A06 -12.846079 -14.07115
In the graph returned by griph_cluster, cells are represented by graph vertices. All cells are contained in the full graph (res$GRAO
), and a representatitve subset of cells in the simplified graph (res$plotGRAO
), which will be generated by plotGraph and used for plotting. The graph can be easily plotted again, for example to separately illustrate known and predicted cell types:
g.true <- plotGraph(res, fill.type = "true", line.type = "none")