|
Report on the MASCOS Workshop on Mathematics and
Statistics in Genetics
University of Queensland
Friday 21 April, 2006
Description
The study of the genetics of populations and the
mathematics and statistics used to describe the underlying processes
have had a long association, to the extent that the names of many early
practitioners are as recognized in biology as in the mathematical
sciences (G. H. Hardy and R. A. Fisher, to name two).
Furthermore, a surprisingly large proportion of workers in mathematical
population genetics have either been based in or started their careers
in Australia.
This workshop had two aims. The first was to showcase to
researchers at the more biological end of the spectrum what skills are
available within the mathematics and statistics community and
thus to encourage cross-disciplinary collaborations. The second
was to gather together and hopefully build a greater sense of community
among local statisticians and mathematicians who work on applications
in
genetics.
Approximately 40 people attended the Workshop. Most came
from universities or research institutes in South East Queensland
although there were about five from interstate.
The workshop was sponsored by the ARC Centre of Excellence for
the Mathematics and Statistics of Complex Systems (MASCOS).
Invited Speakers
- Mark Blows (Zoology & Entomology, University of
Queensland)
- Grant Hamilton (School of Mathematical Sciences, Queensland
University of Technology)
- Liat Jones (ARC Centre for Bioinformatics, Univeristy of
Queensland)
- Jonathan Keith (Mathematics, University of Queensland)
- Martin O'Hely (MASCOS, University of Queensland)
- Paul Slade (Statistics, University of Adelaide)
- Ian Wood (School of Mathematical Sciences, Queensland
University of Technology)
[There were no contributed papers]
Venue Riverview Room, Emmanuel
College, St
Lucia Campus, University of Queensland
Organizer
Martin O'Hely (MASCOS, University of Queensland)
Programme
8:45am
|
Arrival, registration and
coffee
|
9:10
|
Welcoming remarks
|
9:15
|
Liat Jones: Statistical Analysis of Microarray Data
|
10:00
|
Grant Hamilton: Bayesian estimation of recent migration
rates during a range expansion
|
10:45
|
Morning tea (refreshments
provided)
|
11:00
|
Mark Blows: The dimensionality of the genetic
variance-covariance matrix
|
11:45
|
Paul Slade: Stochastic and computational modeling of
gene genealogy: does natural selection mimic varying population size?
|
12:30pm |
Lunch (provided)
|
1:45
|
Ian Wood: Bayesian Hierarchical Models for
Meta-Analysis
|
2:30
|
Jonathan Keith: Segmenting Eukaryote Genomes with the
Generalised Gibbs Sampler
|
3:15
|
Afternoon tea
(refreshments provided) |
3:45
|
Martin O'Hely: The frequency of a segregating duplicate
gene
|
4:30
|
End of Workshop
|
Abstracts
- Mark Blows, speaking on joint work with Emma Hine (Zoology
& Entomology, UQ) The
dimensionality of the
genetic variance-covariance matrix
Since
the
introduction of
Fisher's geometric model, the number of genetically independent
traits underlying a set of functionally related phenotypic traits
has been recognized as an important factor influencing the response to
selection. Determining the dimensionality of genetic
variance-covariance
(G) matrices provides an important perspective on the genetic basis of
a multivariate suite of traits that is not available when univariate
genetic variances and bivariate genetic correlations are interpreted
in isolation. We show how the effective dimensionality of G can be
established using three alternative methods; the determination of the
dimensionality of the effect space from a multivariate general linear
model (Amemiya 1985), factor-analytic modeling, and bootstrapping. A
simulation study indicated that while the performance of Amemiya's
method was more sensitive to power constraints, it performed as well
or better than factor-analytic modeling in correctly identifying the
original genetic dimensions at moderate to high levels of heritability.
The bootstrap approach, which is the only method to have been adopted
in the genetic and ecological literature, consistently overestimated
the number of dimensions in all cases, and performed less well
than Amemiya's method at subspace recovery. Applied to data from
transcriptional profiling experiments conducted within quantitative
genetic experimental designs, these approaches have the potential to
determine the number and nature of genetically independent sets of
regulated genes.
- Grant Hamilton (School of Mathematical Sciences, QUT) Bayesian estimation of recent migration
rates during a range expansion
Using
molecular genetic
data to make demographic inferences continues to be a challenging
problem. Recent maximum likelihood and Bayesian approaches have
shown that it is possible to make full use of the data. However,
simplified demographic models have generally been used due to the
difficulty in computing the likelihood for complex models.
Approximate Bayesian Computation (ABC) presents as
a promising alternative in cases where likelihoods are intractable but
simulation is relatively easy. Beaumont et al. (2002) recognised that a
rejection sampling approach could be improved by the introduction of a
regression. We have taken extended this approach into the spatial
domain, by estimating the parameters of a range expansion under a
two-dimensional stepping stone model of range expansion. I will present
two case studies illustrating the method.
- Liat Jones (ARC Centre for Bioinformatics, UQ) Statistical Analysis of Microarray Data
Microarrays
allow the
measurement of gene expressions for a biological sample (tissue) on a
genome-wide scale, and form part of the high-throughput -omics
methodology which is changing the face of biological research
(genomics, proteomics and metabonomics). They are now standard tools in
biology, with an ultimate goal for their use in clinical medicine for
diagnosis and prognosis, in particular in cancer towards guiding
therapeutic management.
Yet the data produced pose a real challenge for statistical analysis,
where the numbers of genes can be in the tens of thousands, but the
numbers of samples are in the tens, or hundreds in the largest studies.
Traditional statistical approaches no longer apply, and need to be
modified to carry out the analyses required, in order to draw sound
conclusions from these experiments.
In this talk I will briefly introduce the principles of the microarray
experiment and mention some of the common approaches in data analysis,
assuming the data have been cleaned and preprocessed. These include
cluster analysis (clustering either the genes or tissues) and
supervised classification to find subsets of "marker genes". The
remainder and main part of the talk will focus on our work in detecting
differentially expressed genes in a given number of classes, a problem
still under debate in the literature and often the major goal for a
microarray study.
- Jonathan Keith (Maths, University of Queensland) Segmenting Eukaryote Genomes with the
Generalised Gibbs Sampler
A surprising result that has emerged from the
comparison of large eukaryotic genomes is that the proportion of such
genomes under purifying selection is apparently much larger than the
proportion coding for proteins. Although the task of delineating
protein-coding elements is well advanced, the functional non-coding
portion is much less well understood, and is only beginning to be
delineated. In this talk I will present a new method for delineating
the conserved fraction of eukaryote genomes based on Bayesian sequence
segmentation of pair-wise whole-genome alignments. The method is
applied to an alignment of the Drosophila melanogaster genome to the
Drosophila simulans genome. Despite the fact that these species
diverged only a few million years ago, the new method was able to
identify well-resolved slowly and rapidly evolving fractions. The
method is also able to identify most of the sequences within these
fractions, unlike previous approaches. The results indicate that
approximately 61.7% of the Drosophila melanogaster genome is in the
slowly evolving fraction, approximately 2.7% is in the
rapidly evolving fraction and approximately 18.3% is evolving at
an intermediate rate (the remaining 17.3% is comprised of indels or is
not aligned). Almost all (approximately 90%) of the aligned
protein-coding sequence is in the slowly evolving fraction, suggesting
that this fraction (which comprises the majority of the Drosophila
genome) is functional. The rapidly evolving fraction is also enriched
for protein coding sequence, suggesting that this fraction may also be
functional.
Software, data, and results will shortly be made available online at http://www.uq.edu.au/~uqjkeith/.
- Martin O'Hely (MASCOS, University of Queensland) The frequency of a segregating duplicate
gene
Suppose a
duplicate copy
of a gene appears at a locus which is loosely linked to the "normal"
position of the gene in the genome of some organism. Questions
which come to mind include: what is the chance that the function of
this gene will eventually be relocated to the new locus? how long would
this take? and how would the population genetics of the organism look
while this is happening? I present a stochastic model of the
situation, show that its overall behaviour is well-modelled by a
one-dimensional diffusion, and thereby infer answers to these
questions. In particular I show that there is a marked tendency
for the population to harbour equal frequencies of the gene at the two
loci.
- Paul Slade (University of Adelaide) Stochastic and computational modeling of
gene genealogy: does natural selection mimic varying population size?
The Coalescent process is a well established tool of
mathematical
population genetics and provides a mathematical description of the
genealogy of a sample of genes. Various simplifying assumptions
and mathematical devices are utilized to render a continuous-time
stochastic death process that yields a Markov chain that is amenable to
computational simulation. In the presence of weak natural
selection the coalescent analogue is a branching-coalescing random
graph called the ancestral selection graph. The main body of
coalescent theory is restricted to consideration of a population of
constant size over time. I will present an overview of a recent
extension to the ancestral selection graph that allows for exponential
population size growth. The resulting model informs us of how
selection and population growth interact to influence genealogical
timing properties.
- Ian Wood (School of Mathematical Sciences, QUT) Bayesian Hierarchical Models for
Meta-Analysis
Meta-analysis provides a means of statistically
combining the results
of a number of studies. It can be applied to genetic association
studies with the aim of producing clearer results than any of the
individual studies. I am currently interested in meta-analysis of
studies of the effects of genetic polymorphisms such as SNPs (single
nucleotide polymorphisms) on quantitative phenotypic traits. These are
of interest in breeding livestock and also in human genetic
epidemiology.
Here I discuss the use of a Bayesian hierarchical model to combine
genetic association studies which contain significant heterogeneity. I
use the example of a meta-analysis of the association between the TG5
Thyroglobulin SNP (C/T) in cattle and with levels of marbling in their
beef. A random effects formulation is used with assumptions of
normality at each level, with different variances used at the estimate,
study and breed levels.
We required that study results included were able to be represented as
pair-wise contrasts with corresponding standard errors. The prior
distributions on the variances were of a conjugate form
(1/chi-squared), and were made informative by centering them on
variances estimated from the data. Results obtained using WinBUGS
showed a high probability of overall association between copies of the
T allele and beef marbling.
Participants
Name
|
Email
|
Affiliation
|
Ellie Adamson |
e.adamson |
QUT |
Claire Bellis |
Claire.Bellis (student) |
Griffith |
Beben Benyamin |
Beben.Benyamin |
QIMR |
Mark Blows |
m.blows |
UQ |
Damien Broderick |
Damien.Broderick (dpi) |
Qld Govt |
Michael Bulmer |
m.bulmer |
UQ |
Alhadi Bustamam |
alhadi (maths) |
UQ |
Zahra Cici |
z.h.cici |
UQ |
Sudath Terrence Dammannagoda |
s.dammannagoda |
QUT |
Gareth Evans |
gevans (maths) |
UQ |
Michael Green |
m.green |
Griffith |
Grant Hamilton |
g.hamilton |
QUT |
Adam Harris |
adamh (turing) |
UNE |
Emma Hine |
s364080 (student) |
UQ |
Elizabeth Holliday |
elizabeth_holliday (qcmhr) |
UQ |
Mohammad Hosseini-Nasab |
Mohammad.Hosseini-Nasab (maths) |
ANU |
Ian Hughes |
i.hughes |
QUT |
David Hurwood |
d.hurwood |
QUT |
Liat Jones |
liatj (maths) |
UQ |
Jonathan Keith |
jonathan (maths) |
UQ |
Pauline Kwong |
kwong.pauline (gmail.com) |
Newcastle |
Allan McRae |
allan.mcrae |
QIMR |
Sho Nariai |
sho (maths) |
UQ |
Harald Oey |
H.Oey |
Griffith |
Martin O'Hely |
ohely (maths) |
UQ |
Khaleel Petrus |
petrus |
USQ |
Phil Pollett |
pkp (maths) |
UQ |
Josh Ross |
jvr (maths) |
UQ |
Asrul Sani |
asani (maths) |
UQ |
Paul Slade |
paul.slade |
Adelaide |
Stuart Stephen |
s.stephen (imb) |
UQ |
Attila Szvetko |
A.Szvetko |
Griffith |
Tianhai Tian |
tian (maths) |
UQ |
Michael Towsey |
m.towsey |
QUT |
Ngoc Mai Tran |
ngoc.tran (studentmail) |
Newcastle |
Bill Whiten |
W.Whiten (mailbox) |
UQ |
Leesa Wockner |
s4077516 (student) |
UQ |
Ian Wood |
i.wood |
QUT |
Hanjun Zhang |
hjz (maths) |
UQ |
(Email addresses can be
constructed using the email given, the at sign, and the standard
institutional domain name. Where a subdomain is needed it is
indicated in parentheses).
24
April 2006
|