PERMR web page:

This page contains supplementary programs and figures for the following paper:

Wood, I.A., Visscher, P.M., Mengersen, K.L. (2007) “Classification based upon gene expression data: bias and precision of error rates, Bioinformatics 23 1363-1370.

These can all be downloaded in a single zip file.

Additional figures: estimates of error rate (Err) and average class error rate (Ea) for the datasets were refer to as random, Khan and Sharma. There are 100, 83 and 60 observations in these datasets respectively.

The full dataset described by Khan et al (2001) is available at: http://home.ccr.cancer.gov/oncology/oncogenomics/. The reduced Khan dataset used here comprises all 83 of the small-round blue-cell tumour (SRBCT) observations with the 2308 genes which passed the intensity requirements imposed in Khan et al. (2001). The full dataset does not contain labels for the test observations, but these are available in supplementary files available from the above site. We included labels for all observations in our reduced version, since this was needed for assessment by cross-validation.

The full dataset described by Sharma et al (2005) contains 102 observations, including multiple measurements from some patients. It is available at http://breast-cancer-research.com/content/7/5/R634. The reduced Sharma dataset used here is a randomly selected subset of 60 observations, representing one per patient, which avoids consideration of methods for aggregating results. For sample IDs 1-60, the individual observations used had the following batch labels: 4,2,6,2,10,3,6,4,6,4,10,6,12,7,11,2,8,9,11,2,2,2,9,9,10,10,14,12,12,15,11,15, 8,14,13,13,13,13,14,14,16,16,16,16,15,15,15,15,7,3,1,3,1,1,1,5,5,3,1,5.

Also available here:

R code to run 2-level external cross-validation on these three datasets and to perform permutation tests.

Experiment code to allow replication of the main results reported in the paper.

References for data sets:

J. Khan, J. Wie, M. Ringner, L. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. Antonescu, C. Peterson, and P. Meltzer. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6):673–679, 2001.

P. Sharma, N. Sahni, R. Tibshirani, P. Skaane, P. Urdal, H. Berghagen, M. Jensen, L. Kristiansen, C. Moen, P. Sharma, A. Zaka, J. Arnes, T. Sauer, L. Akslen, E. Schlichting, A. Børresen-Dale, and A. Lönneborg. Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Research, 7:R634–R644, 2005.

Email addresses of authors:

Ian Wood: ian.wood@maths.uq.edu.au

Peter Visscher: Peter.Visscher@qimr.edu.au

Kerrie Mengersen: k.mengersen@qut.edu.au

Maintained by Ian Wood. Last modified 15th December 2008.