View on GitHub

OpenCyto


A Robust BioConductor Framework for Automated Flow Data Analysis

OpenCyto Automated Gating of HVTN080

OpenCyto Automated Gating of HVTN080

We load required libraries.

suppressPackageStartupMessages(
{
  library(flowWorkspace)
  library(COMPASS)
  require(data.table)
  library(reshape2)
  library(plyr)
  })

The HVTN080 study flow cytometry data has been preprocessed (loaded in R from the raw FCS files and the FlowJo workspaces). The gating set is available here, and has been gated using OpenCyto.

We have gated the data using this OpenCyto template.

The manually gated data is available here, and contains the manual gates from the FlowJo workspaces.

The raw FCS files are available from flowrepository.org/id/FR-FCM-ZZ7U.

The data in the gating sets has already been compensated and transformed using the compensation and transformation information stored in the FlowJo workspaces.

The FlowJo workspaces for this data will be posted soon.

Below, we extract the feature information for the CD4+ and CD8+ cytokine producing cell subsets.

Loading the gated data

We can load these gating sets into R.

WORKDIR<-"./" #Set to path of local data
setwd(WORKDIR)
dataPath <- "./"
gsNames <- c("manual", "auto") 
gslist <- sapply(gsNames, function(thisGSName, dataPath){
      thisFolder <- paste("gs", thisGSName,  "clean", sep = "_")
      load_gs(file.path(dataPath, thisFolder))
    }, dataPath)    
pd <- pData(gslist[[1]])

How to reproduce the OpenCyto automated gating

If you want to regenerate the automated gating results, you can do so starting from the template. Load the gating set with the already gated data, delete all nodes, and re-run the gating. Like so:

Rm("boundary",gslist$auto)  #The boundary filter is the first gate in the GatingSet. We remove it and thus all downstream gates.

#Next load the template
hvtn080_ics<-gatingTemplate("gt_080.csv") #Load the template file. Download it from the link above and put it in the current directory.

#And now run the gating
gating(hvtn080_ics, gslist$auto) #apply the template to the data.

Some filtering for paired subjects

Next we remove unparied subjects (that only have one visit), since we want to compare pre-vaccine to post-vaccine.

pd<-data.table(pd)
toremove<-subset(ddply(pd,.(pub_id),function(x)with(x,nlevels(factor(VISITNO))==1)),V1==TRUE)$pub_id
if(length(toremove)!=0){
  pd <- subset(pd, pub_id != toremove) 
}
  pub_id<-pd[,pub_id:=factor(pub_id)]
  pd<-pd[,name:=factor(name)] 

Once we've identified those subjects, we filter the automated gating set (and the manually gated data) to keep only complete samples.

gslist <- llply(gslist, function(gs)gs[as.character(pd$name)])

Extract data matrices for further analysis.

Then we extract data matrcies from the gating set. We're focusing on the cytokine expressing CD4 and CD8 T-cell subsets.

ics_markers <- c("57", "Perforin", "Granzyme B", "IFNg", "IL2", "TNFa")
parentNodes <- c("4+", "8+")
#' ## Extract the data
#+ eval=F
data_mat_list <- sapply(gsNames, function(thisGSName){
      gs <- gslist[[thisGSName]]

      # Construct expresion from all ics markers
      #exclude Perforin altogether since it was not gated correctly
      ics_markers <- ics_markers[-2]
      this_expr <- paste(ics_markers, "+", sep = "")
      sapply(parentNodes, function(thisParent){

          this_expr <- paste(thisParent, this_expr, sep = "/")
          this_expr <- paste(this_expr, collapse = "|")
          this_expr <- as.symbol(this_expr)


            popVSmarker <- list("CD57")
            names(popVSmarker) <- paste(thisParent,"57+", sep = "/")
            res <- getData(gs,this_expr, popVSmarker)
            parent_counts <- sapply(sampleNames(gs),function(this_sample)getTotal(gs[[this_sample]], thisParent, flowJo = F))
            list(res = res, parent_counts = parent_counts)
          }, simplify = FALSE, USE.NAMES = TRUE)   
    }, simplify = FALSE, USE.NAMES = TRUE)

Use COMPASS to get the % of all non-empty combinations

pop_stats <- lapply(data_mat_list, function(data_mat){#for each gs

      countList <- lapply(names(data_mat), function(thisParent){#for each parent population
              thisData <- data_mat[[thisParent]]
              thisCounts <- thisData$parent_counts
              thisRes <- thisData$res


              comp <- COMPASSContainer(thisRes
                                      , thisCounts
                                      , pd
                                      , individual_id = "pub_id"
                                      , sample_id = "name"
                                      ) 

              boolSubsets <- UniqueCombinations(comp, F)                                   
              subset_counts <- CellCounts(comp$data, boolSubsets)

              #add samplenames and boolean combination names
              sn <- names(thisCounts)
              rownames(subset_counts) <- sn
              thisMarkers <- as.vector(colnames(thisRes[[1]]))
              popNames <- unname(unlist(lapply(boolSubsets, function(thisSub){
                    notOp <- ifelse(thisSub>0, "", "!")
                    paste(paste(notOp, thisMarkers, sep = ""), collapse = "&")
                  })))

              colnames(subset_counts) <- popNames
              subset_counts <- as.data.frame(subset_counts)
              #add IL2/IFNg (by subtracting the negated components from total)
              toSubtract <- c("IL2", "IFNg")
              patternTOExclude <- paste("!", toSubtract, sep = "")
              logicalInd <- lapply(patternTOExclude, grepl, x = popNames)
              toExclude <- Reduce(`&`,logicalInd)
              toExclude <- popNames[toExclude]
              toExclude <- rowSums(subset_counts[,toExclude])
              total <- rowSums(subset_counts)
              toAdd <- total - toExclude
              newPop <- paste(toSubtract, collapse = "|")
              subset_counts[, newPop] <- toAdd

              #prepend the parent name to each pop
              colnames(subset_counts) <- paste(thisParent, colnames(subset_counts), sep = "_")

              # compute the %
              #append parent count  
              # is this gurantee to be column-wise division? 
              subset_props <- subset_counts/thisCounts
              subset_props <- as.data.frame(subset_props)
#              subset_props[, thisParent] <- thisParentProp
              subset_props

          })  

      do.call(cbind, countList)
  })
cytokines <- c("IFNg", "IL2", "TNFa")
pop_stats <- sapply(gsNames, function(thisGSName){

      data_mat <- pop_stats[[thisGSName]]

      thisFeatureSet <- colnames(data_mat)

      patternTOExclude <- paste("!", cytokines, sep = "")
      logicalInd <- lapply(patternTOExclude, grepl, x = thisFeatureSet)
      toExclude <- logicalInd[[1]]
      for(thisInd in logicalInd[-1]){
        toExclude <- toExclude & thisInd
      }
      data_mat[, !toExclude]
    }, simplify = FALSE, USE.NAMES = TRUE)

Subtract the background (mean % of negctrl)

pd<-data.frame(pd)
features <- lapply(pop_stats, function(stats){

      #uniform the stim name for negctrl
      thisStim <- as.vector(pd[, "Stim"])
      thisStim[grepl("negctrl*", thisStim)] <- "negctrl"
      pd[, "Stim"] <- as.factor(thisStim)
      #change pop stats from wide to long
      stats <- name_rows(stats)
      stats <- melt(stats)
      # merge with pdata
      stats <- merge(stats, pd, by.x = ".rownames", by.y = "name")
#      browser()
      # average % for negctrl
      stats <- ddply(stats, .(variable,Stim,pub_id,VISITNO), summarise, value = mean(value))
      # move Stim from long to wide
      stats <- dcast(stats, variable + pub_id + VISITNO ~ Stim, value.var = "value")
      #perform background subtraction
      allStims <- levels(pd[, "Stim"])
      allStims <- allStims[!allStims%in%"negctrl"]
      for(thisStim in allStims){
        stats[, thisStim] <- stats[, thisStim] - stats[, "negctrl"]  
      }

      #move Stim back to long
      stats <- melt(stats, id.vars = c("variable", "pub_id" , "VISITNO"), variable_name = "Stim") 
      #move pop stats back to wide
      stats <- dcast(stats,  Stim + pub_id + VISITNO ~ variable, value.var = "value")

      subset(stats, Stim != "negctrl")
    })

Save the results matrix.

saveRDS(features, file.path(".", "features2.rds"))
features2 <- readRDS(file.path(dataPath, "features2.rds"))

SessionInfo

sessionInfo()
## R Under development (unstable) (2014-02-06 r64933)
## Platform: x86_64-apple-darwin13.0.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scales_0.2.3              ggplot2_0.9.3.1          
##  [3] reshape_0.8.4             plyr_1.8                 
##  [5] reshape2_1.3.0.99         Kmisc_0.5.1              
##  [7] COMPASS_1.1.5-1           pheatmap_0.7.8           
##  [9] RColorBrewer_1.0-5        mvtnorm_0.9-9997         
## [11] Rgraphviz_2.7.0           graph_1.41.2             
## [13] data.table_1.8.11         openCyto_1.1.15          
## [15] flowWorkspace_3.9.65      gridExtra_0.9.1          
## [17] ncdfFlow_2.9.19           flowViz_1.27.16          
## [19] lattice_0.20-27           flowCore_1.29.23         
## [21] flowWorkspaceData_1.99.13 knitr_1.5.22             
## 
## loaded via a namespace (and not attached):
##  [1] abind_1.4-0           Biobase_2.23.4        BiocGenerics_0.9.3   
##  [4] car_2.0-19            clue_0.3-47           cluster_1.14.4       
##  [7] coda_0.16-1           colorspace_1.2-4      compositions_1.30-2  
## [10] corpcor_1.6.6         DEoptimR_1.0-1        dichromat_2.0-0      
## [13] digest_0.6.4          energy_1.6.0          evaluate_0.5.1       
## [16] fastmatch_1.0-4       fda_2.4.0             feature_1.2.10       
## [19] flowClust_3.3.7       flowStats_3.19.10     formatR_0.10         
## [22] GFMisc_1.0            gtable_0.1.2          gtools_3.2.1         
## [25] hexbin_1.26.3         IDPmisc_1.1.17        KernSmooth_2.23-10   
## [28] ks_1.8.13             labeling_0.2          latticeExtra_0.6-26  
## [31] markdown_0.6.4        MASS_7.3-29           Matrix_1.1-2         
## [34] MCMCpack_1.3-3        munsell_0.4.2         mvoutlier_2.0.3      
## [37] nnet_7.3-7            pcaPP_1.9-49          pls_2.4-3            
## [40] proto_0.3-10          R.methodsS3_1.6.1     R.oo_1.17.0          
## [43] R.utils_1.29.8        RBGL_1.39.1           Rcpp_0.11.0.2        
## [46] rgl_0.93.996          robCompositions_1.7.0 robustbase_0.90-2    
## [49] rrcov_1.3-4           RSVGTipsDevice_1.0-4  sgeostat_1.0-25      
## [52] stats4_3.1.0          stringr_0.6.2         tensorA_0.36         
## [55] tools_3.1.0           XML_3.98-1.1          zlibbioc_1.9.0