AUSTRALIAN RESEARCH COUNCIL
Centre of Excellence for Mathematics
and Statistics of Complex Systems

University of Queensland site
The Vortex

Home
About
Events
Colloquia
Activities
People
PhD Scholarships
Vacation Scholarships
Stochastic Coffee
Images
Links
Contact
Report on the MASCOS Workshop on Mathematics and Statistics in Genetics


University of Queensland
Friday 21 April, 2006

Description   The study of the genetics of populations and the mathematics and statistics used to describe the underlying processes have had a long association, to the extent that the names of many early practitioners are as recognized in biology as in the mathematical sciences (G. H. Hardy and R. A. Fisher, to name two).  Furthermore, a surprisingly large proportion of workers in mathematical population genetics have either been based in or started their careers in Australia. 

This workshop had two aims.  The first was to showcase to researchers at the more biological end of the spectrum what skills are available within the mathematics and statistics community and thus to encourage cross-disciplinary collaborations.  The second was to gather together and hopefully build a greater sense of community among local statisticians and mathematicians who work on applications in genetics.

Approximately 40 people attended the Workshop.  Most came from universities or research institutes in South East Queensland although there were about five from interstate. 

The workshop was sponsored by the ARC Centre of Excellence for the Mathematics and Statistics of Complex Systems (MASCOS).

Invited Speakers  

  • Mark Blows (Zoology & Entomology, University of Queensland)
  • Grant Hamilton (School of Mathematical Sciences, Queensland University of Technology)
  • Liat Jones (ARC Centre for Bioinformatics, Univeristy of Queensland)
  • Jonathan Keith (Mathematics, University of Queensland)
  • Martin O'Hely (MASCOS, University of Queensland)
  • Paul Slade (Statistics, University of Adelaide)
  • Ian Wood (School of Mathematical Sciences, Queensland University of Technology)
[There were no contributed papers]

Venue   Riverview Room, Emmanuel College, St Lucia Campus, University of Queensland

Organizer   Martin O'Hely (MASCOS, University of Queensland)

Programme   

8:45am
Arrival, registration and coffee
9:10
Welcoming remarks
9:15
Liat Jones: Statistical Analysis of Microarray Data
10:00
Grant Hamilton: Bayesian estimation of recent migration rates during a range expansion
10:45
Morning tea (refreshments provided)
11:00
Mark Blows: The dimensionality of the genetic variance-covariance matrix
11:45
Paul Slade: Stochastic and computational modeling of gene genealogy: does natural selection mimic varying population size?
12:30pm Lunch (provided)
1:45
Ian Wood: Bayesian Hierarchical Models for Meta-Analysis
2:30
Jonathan Keith: Segmenting Eukaryote Genomes with the Generalised Gibbs Sampler
3:15
Afternoon tea (refreshments provided)
3:45
Martin O'Hely: The frequency of a segregating duplicate gene
4:30
End of Workshop

Abstracts 

  • Mark Blows, speaking on joint work with Emma Hine (Zoology & Entomology, UQ) The dimensionality of the genetic variance-covariance matrix
Since the introduction of Fisher's geometric model, the number of genetically independent traits underlying a set of functionally related phenotypic traits has been recognized as an important factor influencing the response to selection. Determining the dimensionality of genetic variance-covariance (G) matrices provides an important perspective on the genetic basis of a multivariate suite of traits that is not available when univariate genetic variances and bivariate genetic correlations are interpreted in isolation. We show how the effective dimensionality of G can be established using three alternative methods; the determination of the dimensionality of the effect space from a multivariate general linear model (Amemiya 1985), factor-analytic modeling, and bootstrapping. A simulation study indicated that while the performance of Amemiya's method was more sensitive to power constraints, it performed as well or better than factor-analytic modeling in correctly identifying the original genetic dimensions at moderate to high levels of heritability. The bootstrap approach, which is the only method to have been adopted in the genetic and ecological literature, consistently overestimated the number of dimensions in all cases, and performed less well than Amemiya's method at subspace recovery. Applied to data from transcriptional profiling experiments conducted within quantitative genetic experimental designs, these approaches have the potential to determine the number and nature of genetically independent sets of regulated genes.
  • Grant Hamilton (School of Mathematical Sciences, QUT) Bayesian estimation of recent migration rates during a range expansion
Using molecular genetic data to make demographic inferences continues to be a challenging problem.  Recent maximum likelihood and Bayesian approaches have shown that it is possible to make full use of the data. However, simplified demographic models have generally been used due to the difficulty in computing the likelihood for complex models.

Approximate Bayesian Computation (ABC) presents as a promising alternative in cases where likelihoods are intractable but simulation is relatively easy. Beaumont et al. (2002) recognised that a rejection sampling approach could be improved by the introduction of a regression. We have taken extended this approach into the spatial domain, by estimating the parameters of a range expansion under a two-dimensional stepping stone model of range expansion. I will present two case studies illustrating the method.
  • Liat Jones (ARC Centre for Bioinformatics, UQ) Statistical Analysis of Microarray Data
Microarrays allow the measurement of gene expressions for a biological sample (tissue) on a genome-wide scale, and form part of the high-throughput -omics methodology which is changing the face of biological research (genomics, proteomics and metabonomics). They are now standard tools in biology, with an ultimate goal for their use in clinical medicine for diagnosis and prognosis, in particular in cancer towards guiding therapeutic management.

Yet the data produced pose a real challenge for statistical analysis, where the numbers of genes can be in the tens of thousands, but the numbers of samples are in the tens, or hundreds in the largest studies. Traditional statistical approaches no longer apply, and need to be modified to carry out the analyses required, in order to draw sound conclusions from these experiments.

In this talk I will briefly introduce the principles of the microarray experiment and mention some of the common approaches in data analysis, assuming the data have been cleaned and preprocessed. These include cluster analysis (clustering either the genes or tissues) and supervised classification to find subsets of "marker genes". The remainder and main part of the talk will focus on our work in detecting differentially expressed genes in a given number of classes, a problem still under debate in the literature and often the major goal for a microarray study.

  • Jonathan Keith (Maths, University of Queensland)  Segmenting Eukaryote Genomes with the Generalised Gibbs Sampler

    A surprising result that has emerged from the comparison of large eukaryotic genomes is that the proportion of such genomes under purifying selection is apparently much larger than the proportion coding for proteins. Although the task of delineating protein-coding elements is well advanced, the functional non-coding portion is much less well understood, and is only beginning to be delineated. In this talk I will present a new method for delineating the conserved fraction of eukaryote genomes based on Bayesian sequence segmentation of pair-wise whole-genome alignments. The method is applied to an alignment of the Drosophila melanogaster genome to the Drosophila simulans genome. Despite the fact that these species diverged only a few million years ago, the new method was able to identify well-resolved slowly and rapidly evolving fractions. The method is also able to identify most of the sequences within these fractions, unlike previous approaches. The results indicate that approximately 61.7% of the Drosophila melanogaster genome is in the slowly evolving fraction,  approximately 2.7% is in the rapidly  evolving fraction and approximately 18.3% is evolving at an intermediate rate (the remaining 17.3% is comprised of indels or is not aligned). Almost all (approximately 90%) of the aligned protein-coding sequence is in the slowly evolving fraction, suggesting that this fraction (which comprises the majority of the Drosophila genome) is functional. The rapidly evolving fraction is also enriched for protein coding sequence, suggesting that this fraction may also be functional.

    Software, data, and results will shortly be made available online at http://www.uq.edu.au/~uqjkeith/.


  • Martin O'Hely (MASCOS, University of Queensland) The frequency of a segregating duplicate gene
Suppose a duplicate copy of a gene appears at a locus which is loosely linked to the "normal" position of the gene in the genome of some organism.  Questions which come to mind include: what is the chance that the function of this gene will eventually be relocated to the new locus? how long would this take? and how would the population genetics of the organism look while this is happening?  I present a stochastic model of the situation, show that its overall behaviour is well-modelled by a one-dimensional diffusion, and thereby infer answers to these questions.  In particular I show that there is a marked tendency for the population to harbour equal frequencies of the gene at the two loci.
  • Paul Slade (University of Adelaide) Stochastic and computational modeling of gene genealogy: does natural selection mimic varying population size? 

    The Coalescent process is a well established tool of mathematical population genetics and provides a mathematical description of the genealogy of a sample of genes.  Various simplifying assumptions and mathematical devices are utilized to render a continuous-time stochastic death process that yields a Markov chain that is amenable to computational simulation.  In the presence of weak natural selection the coalescent analogue is a branching-coalescing random graph called the ancestral selection graph.  The main body of coalescent theory is restricted to consideration of a population of constant size over time.  I will present an overview of a recent extension to the ancestral selection graph that allows for exponential population size growth.  The resulting model informs us of how selection and population growth interact to influence genealogical timing properties.

  • Ian Wood (School of Mathematical Sciences, QUT) Bayesian Hierarchical Models for Meta-Analysis 

    Meta-analysis provides a means of statistically combining the results of a number of studies. It can be applied to genetic association studies with the aim of producing clearer results than any of the individual studies. I am currently interested in meta-analysis of studies of the effects of genetic polymorphisms such as SNPs (single nucleotide polymorphisms) on quantitative phenotypic traits. These are of interest in breeding livestock and also in human genetic epidemiology.

    Here I discuss the use of a Bayesian hierarchical model to combine genetic association studies which contain significant heterogeneity. I use the example of a meta-analysis of the association between the TG5 Thyroglobulin SNP (C/T) in cattle and with levels of marbling in their beef. A random effects formulation is used with assumptions of normality at each level, with different variances used at the estimate, study and breed levels.

    We required that study results included were able to be represented as pair-wise contrasts with corresponding standard errors. The prior distributions on the variances were of a conjugate form (1/chi-squared), and were made informative by centering them on variances estimated from the data. Results obtained using WinBUGS showed a high probability of overall association between copies of the T allele and beef marbling.

Participants

Name
Email
Affiliation
Ellie Adamson e.adamson QUT
Claire Bellis Claire.Bellis (student) Griffith
Beben Benyamin Beben.Benyamin QIMR
Mark Blows m.blows UQ
Damien Broderick Damien.Broderick (dpi) Qld Govt
Michael Bulmer m.bulmer UQ
Alhadi Bustamam alhadi (maths) UQ
Zahra Cici z.h.cici UQ
Sudath Terrence Dammannagoda s.dammannagoda QUT
Gareth Evans gevans (maths) UQ
Michael Green m.green Griffith
Grant Hamilton g.hamilton QUT
Adam Harris adamh (turing) UNE
Emma Hine s364080 (student) UQ
Elizabeth Holliday elizabeth_holliday (qcmhr) UQ
Mohammad Hosseini-Nasab Mohammad.Hosseini-Nasab (maths) ANU
Ian Hughes i.hughes QUT
David Hurwood d.hurwood QUT
Liat Jones liatj (maths) UQ
Jonathan Keith jonathan (maths) UQ
Pauline Kwong kwong.pauline (gmail.com) Newcastle
Allan McRae allan.mcrae QIMR
Sho Nariai sho (maths) UQ
Harald Oey H.Oey Griffith
Martin O'Hely ohely (maths) UQ
Khaleel Petrus petrus USQ
Phil Pollett pkp (maths) UQ
Josh Ross jvr (maths) UQ
Asrul Sani asani (maths) UQ
Paul Slade paul.slade Adelaide
Stuart Stephen s.stephen (imb) UQ
Attila Szvetko A.Szvetko Griffith
Tianhai Tian tian (maths) UQ
Michael Towsey m.towsey QUT
Ngoc Mai Tran ngoc.tran (studentmail) Newcastle
Bill Whiten W.Whiten (mailbox) UQ
Leesa Wockner s4077516 (student) UQ
Ian Wood i.wood QUT
Hanjun Zhang hjz (maths) UQ

(Email addresses can be constructed using the email given, the at sign, and the standard institutional domain name.  Where a subdomain is needed it is indicated in parentheses).

24 April 2006
The Centre of Excellence for Mathematics and Statistics
of Complex Systems is funded by the Australian Research
Council, with additional support from the Queensland
State Government and the University of Queensland