 |
INTRODUCTION |
The recent 85-million-dollar
investment by the National Science Foundation (NSF) in plant genomics
projects promises an enormous reservoir of new information and many new
tools for the rapid analysis of gene-expression patterns, for gene
tagging, and for map-based projects. The individual goals of the 23 funded projects are summarized on-line at
www.nsf.gov/bio/pubs/awards/genome98.htm. Soon, many of the projects
will have their own websites, where project scopes will be more fully
described (e.g. http://www.zmdb.iastate.edu/). A useful
introduction to the goals, new methods, and vocabulary in genomics
research is provided in a recent Update in Plant
Physiology (Bouchez and Höfte, 1998
). An excellent summary
of the promise of genomics for plant biology and agriculture can be
found in the 16 papers published in the Proceedings of the
National Academy of Sciences (1998, volume 95) as the report of a
colloquium entitled "Protecting Our Food Supply: The Value of Plant
Genome Initiatives."
In this Update, I provide an introduction to the NSF-funded
projects, based on information distributed by project coordinators during an October 1998 meeting at the NSF. The projects can be divided
into four major categories: (a) cDNA sequencing and hybridization arrays for gene-expression studies; (b) knockout mutant collections and
categorizing plant phenotypes; (c) genomic mapping with a goal of
reconciling physical and genetic maps; and (d) comparative and
evolutionary analysis of gene families or genomes.
 |
GENE DISCOVERY VIA cDNA SEQUENCING AND EXPRESSION
ANALYSIS |
cDNAs provide the quickest route to gene definition. However, with
the exceptions of Arabidopsis and rice, there are few plant cDNA
sequences available in the public domain. To rectify this, many
projects will sequence cDNAs to generate EST databases (Table I). The time frames for these projects
range from 1 year (major organs of maize) to 3 years for the majority
of projects. In most cases, non-normalized libraries will be sequenced,
resulting in a biased representation of highly and moderately expressed
genes. With a goal of 30,000 unique ESTs, the largest database proposed is for soybean (soybean growers contributed the funds to initiate a
public EST project for this crop).
A critical component of all EST projects is quick and accurate data
delivery to GenBank and project-maintained websites. Bioinformatics services within each project will report details of cDNA library preparation and sampling strategies and then derive consensus sequences
and assemble overlaps (for ESTs sequenced multiple times). For many
projects, preliminary homology searching in public databases will be
part of the EST annotation; this will automatically link new
information to the Arabidopsis and rice ESTs and genomic sequencing data for those ESTs with recognizable matches. For some projects, ESTs
will also be mapped to chromosomes or to BACs, and this information also will be available as part of the annotation.
These projects will also produce and distribute new tools such as
hybridization arrays for global gene-expression studies. Widespread
adoption of these new EST hybridization arrays should allow ready
comparison between experiments and between laboratories using the same
or closely related organisms. The technology for arrays is changing
rapidly: The current low-tech methods involve ESTs spotted onto nylon
membranes, whereas the newer high-tech methods use microarrays on glass
slides. The NSF has funded one project to explore a new technology
using gold-bead-labeled hybridization probes to simplify signal
detection (PI Nina Fedoroff). In all hybridization approaches,
accurately arraying ESTs of known sequence is the most critical step.
Rumors abound of completed gene-expression studies compromised by the
realization that errors occurred during array construction. The
NSF-funded projects share a special responsibility to produce
high-quality EST arrays and guidelines for data interpretation, because
an entire community of plant biologists will become dependent on these
new tools. At least for the near future, it is unlikely that there will
be a commercial source of hybridization arrays for
plants.
An Arabidopsis virtual center (PI Pamela Green) has the largest
component of microarray development and service. The new funding in
functional genomics complements the genome-sequencing effort in
Arabidopsis by providing tools to analyze gene-expression patterns. The
center will not only construct EST microarrays but will also perform
hybridizations with user-supplied samples. In addition, the co-PIs will
provide baseline data by assessing hybridization patterns to RNA from
"core" tissues. The project will also identify plant-specific genes
based on the available Arabidopsis ESTs and genomic sequence, and it
will construct a microarray grid to monitor expression of these genes
under a wide variety of conditions. Project data will be available on a
website and should set the standard for interpretation of hybridization
arrays for this plant.
A policy issue for Plant Physiology and other journals to
consider is whether gene-expression analysis of individual genes will
suffice, and, if so, for how long? When will analysis of global
expression patterns be required for publication? Even if the focus of a
report is a single gene, should authors be expected to define
gene-expression differences in isogenic nonmutant and null allele
stocks or in transgenic overexpression and antisense or suppressed
examples? The desire for more robust information must be tempered by
the knowledge that the new tools will be limited in supply, at least
initially, and could be more costly than northern-blot hybridization
analysis, for example. Second, in what format will array hybridization
data be published? A few examples will illustrate points in a paper,
but the greater impact comes from clever analysis of patterns and
comparisons between data sets. Papers that synthesize vast amounts of
information into comprehensible patterns will be essential to guide the
field. A public repository of hybridization data files may become as
important as public release of DNA sequence information. Should authors
be obligated to maintain accessible files of their hybridization data,
or should journals provide such information to their subscribers?
It is also important to consider that the planned hybridization tools
will be built from cDNA collections that represent moderately and
highly expressed genes very well but contain only a subset of genes
expressed at lower levels or under unique conditions. Thus, the data
are "global" but incomplete for each species. The problem is
compounded if comparisons are made between arrays constructed from
different EST pools. If two species are compared using arrays that are
75% complete (100% complete for high-expression genes and 50%
complete for low-expression genes), there might be only a 25% overlap
of the low-expression classes. Ultimately, all of the transcripts from
a species could be defined and a comprehensive hybridization array
devised. There is certainly room for new projects to produce arrays
drawing on deeper collections of ESTs from specific developmental
stages or physiological treatments to ensure better coverage of the
rare transcript class.
 |
MUTATIONAL APPROACH TO UNDERSTANDING GENE FUNCTION |
The cDNA approach to gene discovery suffers from the limitation
that it may be nearly impossible to recover a cDNA representing each
gene. Random mutagenesis automatically provides a "normalized" approach to gene discovery and is therefore a second route to functional genomics. In this approach, the consequences of eliminating or altering the contribution of a single gene are explored. Several projects propose generating and sharing collections of mutagenized plants, as well as reporting phenotypic analysis of individual mutants
at project websites. For dicots, the main tool will be T-DNA insertion
mutants, with the most effort in Arabidopsis. For example, the analysis
of osmotic responses (PI Hans Bohnert) will generate up to 200,000 T-DNA-tagged lines. An independent resource of the same magnitude will
be generated by PI Pamela Green and her collaborators. The latter
project will sequence the T-DNA/Arabidopsis junction fragments for many
of the insertion events, facilitating the identification of mutations
in specific genes.
Transposons are the choice for insertional mutagenesis in maize. In the
first strategy, PI Hugo Dooner exploits the tendency of Ac
to transpose to closely linked sites. With local hopping, the project
can perform saturation mutagenesis from several locations and perfect
PCR techniques for rapid recovery and analysis of large collections of
insertions. A second goal is to produce transgenic maize lines with
genetically engineered Ac elements at diverse locations.
This collection of lines will facilitate future gene-discovery and
mutagenesis efforts using the methods developed in the first phase of
the project.
Two projects will use Mu elements as transposon tags; these
elements are thought to transpose preferentially into genes but randomly with regard to chromosomes. Using standard Mutator lines with
many copies of mobile Mu elements, the Cold Spring Harbor Laboratory collaborative project (PI Rob Martienssen) will grow and
self more than 40,000 Mutator plants. This will generate a public
mutant collection that should contain more than 1 million insertions
and therefore many different mutations of most genes. Mutants of
interest will be identified by PCR screening using DNA samples prepared
from the original plants. One primer reads out of the highly conserved
Mu termini and a second primer is gene specific. PCR
screening will be provided as a service, and collaborators will receive
seed packets (selfed progeny) of plants identified as carrying the
mutation. This project is a public version of the TUSC (Trait
Utility System for Corn) gene-discovery method developed at Pioneer
Hi-Bred International (Bensen et al., 1995
).
A second Mu-tagging project will generate fewer mutations
but will create an immortalized collection of the mutations in
Escherichia coli libraries. Genetically engineered
RescueMu elements will be used in a gene-tagging and
-sequencing project. RescueMu elements contain pBluescript;
therefore, when total maize DNA is transformed into E. coli,
only the tagged genes are maintained. PI Virginia Walbot and workers at
six other maize genetics laboratories will grow fields of 2304 plants
(a 48-row × 48-column grid) and collect leaf punches along each
row and each column. Each pool of leaf samples will produce a single
RescueMu clone library; the 48-row and 48-column libraries
from one grid of plants fit into a 96-well plate. Because the diversity
of plasmids in a single library is low (approximately 50-500), only a
dozen PCR cycles using a Mu readout and a gene-target primer
are required to detect a product band on an agarose gel. Germinal
insertions are distinguished from somatic events by amplification of
the same fragment in both a row and a column library; the two positive
results also define the plant that contained the mutation of interest.
The fee for receiving selfed seed from that plant is submission of 1 kb
of DNA sequence adjacent to the RescueMu insertion in a
target gene. In addition to user-generated genomic sequences, the
project will sequence approximately 150,000 RescueMu-maize
junctions to aid in the discovery of genes that are not defined by
ESTs.
Phenotypic analysis of public mutant collections will require an
unprecedented effort by geneticists, molecular biologists, and
physiologists to record plant traits in a common format. A first pass
at plant description will be made by the project teams, and these data,
including photographs, will be publicly available. At least for maize
and Arabidopsis, uniform scoring criteria and a data sheet for
recording information will be required if searchable databases are to
be developed. In addition, the Arabidopsis project headed by Pamela
Green and the Cold Spring Harbor Laboratory Mutator project intend to
incorporate additional, more detailed phenotypic information into their
databases; this information will be supplied by community members who
receive seed. For this system to work, users must agree to contribute
to project goals by providing complete and timely information in a
usable format. The incentive to do so is that materials will be
supplied only to cooperative collaborators.
 |
PLACING GENES AND OTHER MARKERS ON CHROMOSOMAL MAPS |
Gene discovery and expression analysis as described in the
previous two sections will have an immediate impact on the way many
plant biologists design experiments. Map construction is a more
abstract goal for investigators with a single gene focus; however, for
deeper insight into genome structure, evolution, and definition of
candidate loci for quantitative trait loci, maps are invaluable guides
(McCouch, 1998
). The first round of NSF plant genome projects includes
many mapping projects with diverse long-term goals but similar
experimental strategies. Larger projects are using a combination of
marker-assisted mapping strategies to place cDNAs, simple sequence
repeats, and other DNA segments onto chromosome locations and BACs.
Within these projects and in all of the smaller projects, novel
technologies will be tested to repackage large genomes into more
manageable units.
BACs have thus far been the most stable method for propagating large
segments of plant chromosomes. The Clemson Center managed by PI Rod A. Wing will continue to be instrumental in generating and characterizing
BAC libraries. Their materials and new BAC libraries prepared by
individual projects will be essential for integrating physical and
genetic maps. BAC fingerprinting underlies much of the physical
mapping. With this technique, the restriction-fragment patterns of
thousands of BACs are compared, and those with shared patterns are
assembled into contigs. By hybridization, DNA markers can be placed on
a BAC and even more finely mapped within a BAC. If such markers are
also mapped to the chromosomes, then BACs can be assembled along the
chromosome map. Sequencing the ends of BACs and mapping these to
chromosomes is another approach for placing the BACs onto existing
genetic maps.
As summarized in Table II, physical maps
based on BACs are proposed for rice, sorghum, and soybean; two or more
projects will generate information for these species. For legumes and
grasses, the extent of synteny within each group will be evaluated as
maps are assembled for multiple species. If the order of genes and other markers is preserved between species, many types of chromosome walking are greatly facilitated (Gale and Devos, 1998
). In addition, the genes within the syntenic block are likely to be homologs, allowing
accurate comparisons of gene changes. Because the entire genome
sequence of Arabidopsis will be known within a few years, synteny with
other plants, even distantly related plants, may be recognizable. To
test this idea, maps for tomato (PI Steven Tanksley) and Medicago
truncatula (PI Douglas Cook) will be cross-referenced to
Arabidopsis.
Mapping genes and DNA sequences is a fundamental activity in genetics.
Scorable phenotypes (alternative alleles at the visible or DNA-sequence
level) are required for traditional mapping, and resolution depends on
the number of recombinant chromosomes examined. Several physical
approaches to circumvent the requirement for polymorphisms and
recombination have been funded. Whole chromosomes can be purified, as
exemplified by introgression of individual maize chromosomes into oat
lines. An existing panel of seven viable oat lines carrying individual
maize chromosomes will be expanded to include all 10 maize chromosomes
(PI Ronald L. Phillips). Maize probes that hybridize in situ to only
one oat addition line are placed on that chromosome. A second
approach to placing maize genes involves analyzing quantitative
hybridization differences. DNA samples from monosomics (2N
1),
trisomics (2N+1), deletion stocks, and translocation stocks (enriched
representation of parts of chromosome arms) will be arrayed onto nylon
filters (PI Virginia Walbot).
Sophisticated repackaging of plant chromosomes will be attempted using
tricks from mammalian somatic cell genetics. The goal of this research
is to prepare stable mammalian cell lines carrying approximately 1% of
a plant genome. These cell lines would be the starting material for
mapping or sequencing projects. Plant chromosome segments, generated by
radiation or cre-lox-mediated recombination, will be fused
onto rodent chromosomes. DNA prepared from these lines and in situ
hybridization can be used to map unknown genes to segments within
chromosomes (PIs Z. Renee Sung and Edward H. Coe).
Centromeres are required for orderly chromosome behavior during mitosis
and meiosis, and yet we know little about the DNA required to organize
a centromere. PI Daphne Pruess and her collaborators will use tetrad
analysis in Arabidopsis and C. reinhardtii to map
centromere regions. The elements will be sequenced and then tested for
function in artificial chromosomes. For researchers interested in
introducing multiple new traits (stacking) into transgenic plants,
building a custom chromosome may ultimately provide the best route.
The physical placement of chromosomes in vivo, monitored using
fluorescent hybridization or protein tags, will be explored in two
projects. PI W. Zacheus Cande will develop a physical map of marker
positions along individual maize chromosome arms. This reality check of
where markers are located in vivo will be useful in verifying maps
developed from BAC contigs. PI Eric Lam will develop a set of
fluorescent beacons to triangulate chromosome positions in Arabidopsis
and tobacco, and will then map how these positions change as a function
of the cell cycle and the state of cellular differentiation. Both of
these projects will provide new kinds of information about chromosome
behavior, packing, and interaction with nuclear landmarks that
transcend our current conception of genomics efforts.
 |
COMPARATIVE AND EVOLUTIONARY APPROACHES TO GENE FUNCTION |
It is a paradox of current molecular analysis that we can
simultaneously appreciate that each species is distinguished by subtle
and profound differences from all other taxa while using sequence
matches to argue for identity of function. At least for some genes, the
sequence differences must hold the explanation for the obvious
species-specific differences used in classification. But which changes
are important in speciation? Have structural or regulatory genes been
more likely to acquire novel functions? Have coding regions or
regulatory regions been primarily responsible for morphological and
functional differences? Is there one pattern, e.g. novel promoters in
regulatory genes, or will each trait examined reflect a unique suite of
alterations? New analytical methods, variously termed horizontal
genomics or phylogenomics, seek answers to these questions (Table
III). These methods require large
DNA-sequence data sets in which the same gene is sequenced from many
accessions. Furthermore, the specimens used must be phylogenetically
instructive examples, and investigators must know the impact of allelic
variation of the chosen genes on specific traits.
In the first round of NSF plant genome projects, the nature of
divergence between maize and teosinte will be examined by assessing diversity within each group and allelic variation along chromosomes 1 and 3 using 50 to 100 markers (PI John F. Doebley). The pace of change
in regulatory regions and exons and in noncoding regions can be
assessed, as well as the pattern of change along the chromosomes. This
project is likely to contribute new statistical approaches to data
evaluation and simulation of models. With the new analytical framework,
data generated from the EST and mapping projects in other plants could
be used in future projects directed at understanding the fixation of
particular traits within or between genera.
Complementing the narrow focus of the maize-teosinte approach,
comparative studies of genes expressed under osmotic stress (PI Hans
Bohnert) and those involved in cellulose synthesis (PI Deborah P. Delmer) will be examined. Diverse species were picked for these
analyses, because previous work highlighted special features of their
biology. Cogent evolutionary arguments will likely await the
intercalation of data from key genes from a more representative set of
flowering plants. On the other hand, the genes and regulatory regions
found in common in the diverse plants examined could highlight the
backbone of conserved features. In these projects, mutants will be used
to test the role of individual genes, and this analysis will strengthen
the interpretation of conserved roles across broad taxonomic
categories.