TCC: an R package for comparing tag count data with robust normalization strategies

The R package, TCC provides users with a robust and accurate framework to perform differential expression analysis of tag count data. Differential expression analysis of tag count data (such as RNA-seq) from high-throughput sequencing technologies is a fundamental means of studying gene expression. We recently developed a multi-step normalization method (TbT; Kadota et al., 2012) for two-group RNA-seq data with replicates. The strategy is to remove data that are potential differentially expressed genes (DEGs) before performing the data normalization. We demonstrated that the DEG elimination strategy (called DEGES) for data normalization is essential for obtaining a well-ranked gene list in which true DEGs are top-ranked and non-DEGs are bottom ranked. TCC provides integrated analysis pipelines with improved data normalization steps, compared with other packages such as edgeR, DESeq, and baySeq, by appropriately combining their functionalities.

Important note! (last modified: May 27, 2013)

While the older version (ver. 1.1.3) of this package is currently available at the CRAN repository, we are now moving it from CRAN to Bioconductor. This webpage is temporal until the next release (perhaps, ver. 1.2.0) of TCC is available upon Bioconductor. The latest version available on this webpage is ver. 1.1.99.

Installation

To install the latest version (ver. 1.1.99) of this package, download the source file and enter the following command after starting R:

install.packages("TCC_1.1.99.tar.gz", repos = NULL, type = "source")

Note that you need to enter the following commands if those packages have not been installed in your R environment:

source("http://bioconductor.org/biocLite.R")
biocLite(c("edgeR", "baySeq", "DESeq", "ROC"))

Documentation

User's Guide (vignette) R script Manual

Citation

This package calls significant functions implemented in the other packages. This is because our normalization procedures combines normalization methods and differential expression methods established by others. For example, the TbT normalization method (Kadota et al., 2012), which is a functionality of the TCC package (Sun et al., submitted), consists of the TMM normalization method (Robinson and Oshlack, 2010) implemented in the edgeR package (Robinson et al., 2010) and an empirical Bayesian method implemented in the baySeq package (Hardcastle and Kelly, 2010). Therefore, please cite the appropriate references when you publish your results.

References

baySeq (R package)
Hardcastle TJ and Kelly KA. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11: 422, 2010 (PMID: 20698981)
DESeq (R package)
Anders S and Huber W. Differential expression analysis for sequence count data. Genome Biol. 11(10): R106, 2010 (PMID: 20979621)
edgeR (R package)
Robinson MD, McCarthy DJ, and Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1): 139-140, 2010 (PMID: 19910308)
NBPSeq (R package)
Di Y, Schafer DW, Cumbie JS, and Chang JH. The NBP negative binomial model for assessing differential gene expression from RNA-Seq. Stat Appl Genet Mol Biol. 10: art24, 2011
TbT (a normalization method implemented in TCC)
Kadota K, Nishiyama T, and Shimizu K. A normalization strategy for comparing tag count data. Algorithms Mol Biol. 7:5, 2012 (PMID: 22475125)
TCC (R package)
Sun J, Nishiyama T, Shimizu K, and Kadota K. TCC: an R package for comparing tag count data with robust normalization strategies. submitted
TMM (a normalization method implemented in edgeR)
Robinson MD and Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11: R25, 2010 (PMID: 20196867)
an exact test for negative binomial distribution (implemented in edgeR)
Robinson MD and Smyth GK. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9: 321-332, 2008 (PMID: 17728317)