SSSAJ Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 12 March 2007
Published in Soil Sci Soc Am J 71:592-600 (2007)
DOI: 10.2136/sssaj2006.0125
© 2007 Soil Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zwolinski, M. D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Zwolinski, M. D.
Agricola
Right arrow Articles by Zwolinski, M. D.
Related Collections
Right arrow Soil Microbiology

MOLECULAR-BASED APPROACHES TO SOIL MICROBIOLOGY

DNA Sequencing: Strategies for Soil Microbiology

Michele D. Zwolinski*

Weber State Univ., 2506 University Cir., Ogden, UT 84408

* Corresponding author (mzwolinski{at}weber.edu).


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 
Sequencing of DNA is a powerful tool for gathering information about organisms and their environments. The 16S rRNA gene has been the preferred gene target for describing soil microbial diversity and for establishing phylogenetic relationships between unknown and uncultivated microorganisms. As sequencing technologies improve and computing power increases, however, longer and more accurate sequence data is becoming available. It is now possible to generate complete genome sequences for individual organisms and even to collect whole-environment genome, or metagenomic, sequence information. The genomes of isolated soil microorganisms have been used to describe the physiology, ecology, and evolution and have lead to important discoveries in medicine and industry. Soil metagenomic libraries contain too much information to sequence completely at this time, but can be mined for novel, and potentially useful, microbial processes and can be used to compare genetic diversity between habitats.

Abbreviations: bp, base pairs • ORFs, open-reading frames • PCR, polymerase chain reaction • rRNA, ribosomal ribonucleic acid


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 
Soil is one of the most diverse habitats on Earth (Gans et al., 2005). Genbank, the largest repository of genetic sequence information, provides >430 000 entries when searched for the word soil. This collection of genetic information reflects the enormous biodiversity of soil and includes gene sequences from the major domains of life: Eukarya, Bacteria, and Archaea, and from viruses. These data have been accumulating for approximately 40 yr due to the advent of, advances in, and widespread use of DNA sequencing for the study of soil organisms.

Understanding the biodiversity of soil is difficult because of the heterogeneity that exists on both local and geological scales (Horner-Devine et al., 2003; Becker et al., 2006; Fierer and Jackson, 2006; Kang and Mills, 2006). No two soils are the same. Texture, temperature, salinity, pH, contaminants, and other characteristics allow a unique microbial community to develop in each soil environment. Molecular techniques including DNA sequencing have been vital in determining the composition of the vast diversity of soil and in understanding the interactions between organisms and their environments.

Nucleic acid sequencing technology has existed since the 1960s (Sanger et al., 1965; Olsen et al., 1986). The efficiency and accuracy of the technology has increased significantly, and environmental microbiologists have quickly adapted these advances for exploring the diversity, ecology, and evolution of natural microbial populations (e.g., Olsen et al., 1986; Woese, 1987). It is not surprising that the rate of Genbank entries for environmental microorganisms, especially those that are not represented by cultivated specimens, has increased dramatically in the past decade (Rappé and Giovannoni, 2003). Compared to an organism's phenotype, its genetic information is more reliable, easier to interpret, and more useful for inferring evolutionary relationships (Woese, 1987). Sequences can be aligned and compared with other sequences to identify organisms, or can be searched for functional or novel genes. Microbial ecologists examine DNA or RNA extracted from environments to monitor community level changes, measure microbial diversity, identify key organisms, and discover new microbial processes. The 16S ribosomal RNA (rRNA) gene sequences of Bacteria and Archaea, and the 18S rRNA of Eukarya, have been the primary tools for identifying the populations inhabiting soil and for monitoring microbial community dynamics following natural or anthropogenic environmental changes. As sequencing technologies continue to advance, however, microbial and environmental genomes (metagenomics) will add to the understanding of microbial communities and their functions.


    APPLICATIONS OF THE 16S rRNA GENE IN SOIL MICROBIOLOGY
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 
The 16S rRNA gene has been the basis for defining microbial community diversity for several decades. The 16S rRNA is a ribonucleic acid molecule found in the ribosome. It has a structural role in the ribosome and a functional role in protein synthesis. The sequence of the rRNA molecule is highly conserved between organisms, but it has regions that change more rapidly than others. These characteristics make the rRNA molecule and its gene sequence a useful tool for comparing relatedness between two or more organisms and estimating the rate of species divergence (Woese, 1987). Related organisms will have fewer differences in the gene sequence than less related organisms. Although other genes, such as functional genes or the intergenic spacer region, are often sequenced or used for denaturing gradient gel electrophoresis or terminal restriction fragment length polymorphism, they are usually used to compare community structure between environments or treatments or to confirm 16S rRNA-based phylogeny. The 16S rRNA gene has remained the dominant tool for environmental microbiology because of the enormous number of 16S rRNA gene sequences that have accumulated in public databases. The Ribosomal Database Project (RDP) now contains 243909 aligned rRNA sequences (Cole et al., 2005).

Comparing sequences to the available databases is a useful first step toward characterizing an unknown sequence, but must be done with caution (Forney et al., 2004). The databases perform only a quick sequence alignment, which can bias the similarity value. Full-length 16S rRNA sequences with >97% are often considered to be from the same species (Stackebrandt and Goebel, 1994). Organisms with nearly identical 16S rRNA genes have been found, however, in significantly different organisms based on the sequences of other genes or on phenotypic characteristics (Forney et al., 2004). Identification of microbial species based on 16S rRNA should be corroborated with genetic and phenotypic evidence (Stackebrandt and Goebel, 1994).

Often the sequences found in soil clone libraries, or from organisms isolated from soil, are only poorly related to any previously sequenced organisms. Many of the clones from soil DNA libraries that are collected in Genbank or the RDP are classified as "unidentified bacterium" or "uncultured environmental clone" (Rappé and Giovannoni, 2003). It is often possible, however, to infer the phylogenetic lineage of sequences that cannot be identified to the species level. This requires careful alignment of the unknown sequences to a collection of similar and dissimilar sequences. The aligned sequences can then be compared using one or more algorithms to determine their relatedness and to construct a phylogenetic tree that illustrates the difference between species calculated as their divergence from a common ancestor (Felsenstein, 1988; Miyamoto and Cracraft, 1991; Ward et al., 1992).

Clone libraries of the 16S rRNA gene can be used to estimate community composition and diversity. Since a clone library contains a collection of 16S rRNA genes from the amplified polymerase chain reaction (PCR) product from a sample, ideally the frequency of sequences within a clone library will reflect the frequency of the sequence (and thus the organism) in the environment. Biases in DNA extraction and PCR make this a problematic assumption, but clone frequency can still have ecologically valuable data. Sorting the clones into groups can streamline the sequencing process. Clones can be grouped by restriction fragment patterns, often called operational taxon units (OTUs). Then representatives of each OTU can be sequenced. Further, the frequency of unique OTUs can be counted and analyzed with ecological measures like the Shannon diversity index and evenness measures (Begon et al., 1990) and species richness between communities can be compared using rarefaction curves and other tools (Hughes et al., 2001; Lunn et al., 2004). Because it is now possible to generate large sequence data sets, however, new statistical tools for analyzing DNA clone libraries directly have been developed that assign clone sequences from a microbial community into OTUs based on their sequences, and can then be used to provide an estimation of species richness and diversity (Schloss and Handelsman, 2005b). Because the OTUs are calculated from the sequence data and not on electrophoretically separated fragments, the diversity estimates are more accurate. Schloss and Handelsman (2005b) used these tools to calculate the species richness within two soil clone libraries and the Sargasso Sea metagenome (Venter et al., 2004) and were able to conclude that the soil communities were drastically undersampled. They estimated they would need >10 000 sequences to accurately sample theses soil microbial communities.

One of the most common goals of soil microbiology is to determine what influences microbial community composition or activity. Beyond measuring species richness and diversity, several statistical tools have been designed to compare communities between samples based on shared OTU composition (Schloss and Handelsman, 2006a) and to measure treatment effects on microbial communities based on sequence alignments (Singleton et al., 2001; Schloss et al., 2004) and phylogenetic trees (Schloss and Handelsman, 2006b). These tools can be used to determine what environmental factors influence microbial community composition or activity.

Biogeography is one microbial ecology issue that is being addressed using statistical analysis of clone libraries. Horner-Devine et al. (2003) compiled sequence-based microbial community data from several studies to demonstrate how soil microbial ecology is influenced by habitat, disturbances, dispersal, speciation, and extinction. They demonstrated that ecological principles used for macroorganisms can sometimes be applied to microorganisms. Further, several studies have indicated that microorganisms are not randomly distributed in the environment (Cho and Tiedje, 2000; Martiny et al., 2006). For example, analysis of a salt marsh sediment microbial community DNA library showed significant community change across a geographical distance and that this change was probably due to environmental heterogeneity (Horner-Devine et al., 2004).

Microbial diversity and community change can also be detected using functional genes like nifH, a gene involved with N2 fixation (Deslippe and Egger, 2006; Izquierdo and Nusslein, 2006). Function-specific genes can provide information about community activity and can be used to support 16S rRNA-based phylogeny. Further, organisms involved with vital community functions sometimes exist in low abundance relative to other soil organisms and may not be detected within a 16S rRNA clone library. A functional gene library can provide increased resolution of these important organisms.

Soil DNA clone libraries have greatly expanded the database of known microbial diversity. Even soil clone libraries, however, underestimate the true diversity of soil habitats and our understanding of microbial communities within soil can be biased by technological difficulties. In a recent review, Janssen (2006) compiled data from 32 soil 16S rRNA clone libraries from different locations to draw general conclusions about soil microbial communities. Not surprisingly, he found that soil clone libraries contain far more diversity than is found when organisms are cultured from soil. Further, the sequences that are frequently found in soil clone libraries do not reflect the frequency of organisms that are often cultivated from soil. For example, the genera Bacillus and Clostridia were infrequently found in the clone libraries although they are frequently cultivated from soil. This discrepancy may be due to a culture bias because these organisms are easily cultivated while other organisms are not, or perhaps these organisms are excluded from the DNA libraries because of difficulties in lysing resistant spores. In this case, the soil 16S rRNA libraries may not be portraying the true diversity of the soil because only the physiologically active members of the communities are detected, not the inactive organisms or spores.

Janssen (2006) also observed that relatively few of the phyla found in the soil clone libraries had representative isolates in the American Type Culture Collection, a repository for microbial isolates. Thus the soil isolates available for culture-based study are limited to only a few phyla. Of the 52 defined bacterial phyla, only 26 have cultured representatives (Rappé and Giovannoni, 2003). Janssen (2006) found 32 phyla represented in the soil clone libraries, but 92% of the clones were representative of only nine phyla, and only two, the Proteobacteria and the Acidobacteria, were found in all of the clone libraries. Within phyla, many of the soil clones could not be identified to specific genera, primarily because of the vast amounts of phylogenetic and phenotypic diversity that remains unknown. This disparity between what is found in clone libraries and what is available in culture collections highlights the need to continue efforts to cultivate organisms. Although this is a challenge, progress is possible, as demonstrated by the cultivation of members of the Acidobacteria (Joseph et al., 2003; Davis et al., 2005).


    WHOLE GENOME SEQUENCING AND METAGENOMICS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 
Advances in sequencing speed, decreases in cost, and improvements in computing power are allowing researchers to investigate environmental microbial diversity, activity, and ecology. Although the 16S rRNA gene sequencing is a high-resolution tool for identifying microorganisms (Kent and Triplett, 2002; Lynch et al., 2004), improved sequencing technology and data analysis now allow an even more detailed view into microbial genomes and microbial communities (Rappé and Giovannoni, 2003; Forney et al., 2004). A goal of genomics studies is to understand the dynamic relationships between microorganisms and their environments (DeLong, 2004; Keller and Zengler, 2004; Lynch et al., 2004; Tyson et al., 2004). Genomic studies also have potential applications to medicine, ecology, energy production, biosecurity, and industry (Nelson, 2003; Celestino et al., 2004; Fraser, 2004; Keller and Zengler, 2004; Daniel, 2005; Genome Management Information System, 2005).

The data gathered from such studies is being collected into databases that can be searched for genes. Many worldwide institutions and agencies support microbial genome sequencing. Of particular interest to environmental microbiologists are The Institute for Genomic Research, the U.S. Department of Energy's (DOE) Joint Genome Institute, and the J. Craig Venter Institute's Joint Technology Center. These organizations support microbial genome sequencing projects and metagenomic studies of unique environments such as thermophilic habitats and soil. They promote the use of high-throughput sequencing technologies to generate sequence data, provide sequencing services to researchers, and offer training in genomic methods (The Institute for Genomic Research, 2005; U.S. Department of Energy, 2005a; J. Craig Venter Institute, 2006). The DOE has a genome program specifically targeted to deciphering the microbial genomes from nonpathogenic microorganisms and microbial environments, the Microbial Genome Project (MGP; Genome Management Information System, 2005; U.S. Department of Energy, 2005b). The goal of the MGP is to identify novel microbial processes that may provide insights into environmental processes or microbial evolution or to develop new technologies for industry, energy production, or bioremediation.

Whole Genome Sequencing
The first whole genome sequence published was from the microorganism Haemophilus influenzae (Fleischmann et al., 1995). As of November 2006, the Genomes OnLine Database (GOLD), a database of sequencing projects worldwide (Liolios et al., 2006), held 460 completed and published genomes and 998 ongoing Bacterial, 56 Archaeal, and 631 Eukaryotic sequencing projects (www.genomesonline.org [verified 19 Dec. 2006]). Table 1 presents a list of completed soil microorganism genomes found in the GOLD database. Whole microbial genomes are sequenced using a shotgun cloning approach. The DNA is extracted from a pure culture, fragmented, and inserted into plasmid vectors, and the clones are sequenced. The resulting library consists of a large collection of overlapping sequences (Venter et al., 1996) that need to be aligned into a contiguous sequence and the genes, operons, and promoters need to be annotated. Genome information, however, can be difficult to interpret. Sequenced regions may be identified as open reading frames (ORFs), but many are for genes with yet-unidentified functions. No completed genome has more than 80% of its genes identified and described (Galperin, 2004). This includes well-studied organisms like Escherichia coli.


View this table:
[in this window]
[in a new window]

 
Table 1. Completed whole genome sequences from soil microorganisms selected from the Genomes Online Database (www.genomesonline.org).

 
Until recently, most of the microorganisms chosen for genome sequencing were of medical importance (DeLong, 2002; Hugenholtz, 2002). Completed genomes of important pathogens from soil include Bacillus anthracis (Read et al., 2003), Streptococcus pyogenes (Green et al., 2005), and Listeria monocytogenes (Nelson et al., 2004). These genomes are benefitting medical science and have been used to identify novel virulence factors, antibiotic resistance genes, and vaccine targets (Rappuoli, 2001; Hughes, 2003; Miesel et al., 2003; Fraser, 2004; Serruto et al., 2004). Pathogens, however, are not the only source for genomic advances in medicine. Genomes from soil microorganisms, especially members of the order Actinomycetes such as the Streptomyces, have revealed new antibiotic, antitumor, and antifungal agents (Bentley et al., 2002; Donadio et al., 2002; Ikeda et al., 2003; Paradkar et al., 2003; Keller and Zengler, 2004).

Although medical applications are important, the emphasis of other sequencing efforts has been on collecting genome sequences from throughout the Bacteria and Archaea for gaining insights into the evolution of microorganisms (DeLong, 2002; Hugenholtz, 2002; Harris et al., 2003; Konstantinidis and Tiedje, 2005). Comparisons of microbial genomes allows researchers to deduce how selected functions have evolved between both closely and distantly related microorganisms and how they can be used to predict genomic regions subject to lateral gene transfer (Hugenholtz, 2002; Gogarten and Townsend, 2005). Whole genome sequences can unveil the differences in the genomes of organisms, even those with highly similar 16S rRNA gene sequences (DeLong, 2002; Nelson, 2003). Comparison of microbial phylogenies based on whole genome sequences are limited to the organisms for which genomes are completed (Hugenholtz, 2002). Although several hundreds of microbial genome sequences are available, these sequences are representative of only a few microbial phyla and primarily include those organisms that can be cultivated.

Genome sequences from microorganisms have also been mined for useful industrial applications or to better understand environmental processes. One example of a soil microorganism that has been sequenced is Methylococcus capsulatus (Bath) (Ward et al., 2004). The metabolic genes within the 3.3 megabase (Mb) genome of this methylotrophic bacterium were studied to predict how methanol oxidation contributes to the central metabolic pathways. Genes for methane oxidation, methanol oxidation, N2 fixation, tricarboxylic acid cycle, Cu-sensing genes, formaldehyde regulation, and many others were identified. They also observed that M. capsulatus has novel and diverse electron transport proteins and hypothesized that this may allow survival in environments with changing O2 potentials (Ward et al., 2004). This example illustrates the importance of generating and using whole genome sequences of cultivated microorganisms to deduce the environmental significance of microorganisms.

Metagenomics
Metagenomics, also called microbial ecogenomics or environmental genomics (Handelsman, 2004), is the analysis of all of the microbial genomes from an environment. It requires neither prior cultivation of the organisms present, nor prior knowledge of the community inhabitants or target sequences. Metagenomic libraries are created by shotgun cloning DNA fragments from an environmental sample. The size of the cloned fragments depends on the needs of the study (Riesenfeld et al., 2004b). In general, small-fragment libraries provide better coverage of the metagenome because harsh DNA extraction protocols, which may sheer DNA but will lyse more cells, can be used. Large-fragment DNA libraries (100–200 kilobases [kb]) have been used to identify multigene pathways (Riesenfeld et al., 2004b).

Gene-based metagenomic studies look to determine the distribution of sequences within a library, to reconstruct microbial genomes, or to compare microbial populations from distinct habitats (Handelsman, 2004; Riesenfeld et al., 2004b; Schloss and Handelsman, 2005a). Phylogenetic comparison of functional gene sequences, rRNA, or other phylogenetic marker genes have been used to demonstrate a link between the phylogeny and specific functions and to compare gene diversity between habitats (Liles et al., 2003; Handelsman, 2004; Riesenfeld et al., 2004a, 2004b; Tyson et al., 2004; Venter et al., 2004). Functional metagenomic studies look for genes that code for microbial processes within the metagenomic library. In addition, because the clones can contain large genomic inserts, the clones may contain operons and promoters that allow the genes on the insert to be expressed in E. coli or other suitable host strains (Rondon et al., 2000; Handelsman, 2004; Martinez et al., 2004; Riesenfeld et al., 2004a) and screened for specific functions and novel bioproducts.

Gene-Based Metagenomics
Currently, gene-based metagenomics studies that catalog all of the genes from an environment are possible for low-diversity environments. A genomic study of the microbial diversity of an acid mine drainage biofilm was the first to predict the biogeochemical roles of organisms in a microbial ecosystem based on a metagenomic survey (Tyson et al., 2004). The species composition of the biofilm had been previously identified using 16S rRNA gene sequencing, but the role of each member could only be hypothesized (Bond et al., 2000). For the metagenomic study, they used random shotgun sequencing of relatively small inserts (3.2 kb) from the biofilm DNA and sequenced 76.2 million base pairs (bp). The sequences were assembled into genome scaffolds for different organisms based on the guanine + cytosine content of the sequence fragments. Because of the low diversity, these researchers were able to reconstruct the majority of the genomes for the two dominant organisms, Leptospirillum group II and Ferroplasma group II, and partial genomes for the few other populations in the system. Overall, they reassembled 85% of the biofilm genome. Within the genome data, they were able to decipher the metabolic pathways for C and N2 fixation, identify membrane transport systems, and determine Fe-oxidation mechanisms, and they were able to assign these processes to specific populations within the biofilm. These insights were used to create new hypotheses about the ecology of the biofilm (Tyson et al., 2004). In more complex environments, finding the "keystone" species, organisms of low abundance with vital function, will be difficult. Tyson et al. (2004), however, were able to attribute N2 fixation for the whole biofilm to a minor community member, Leptospirillum group III, by identifying the N2 fixation gene nifH within this organism's genome. Further support for the information gained in this metagenomics study was obtained through a proteomics analysis of the biofilm (Ram et al., 2005). They linked 49% of the ORFs from the five dominant genomes to at least one peptide sequence. Together, these studies demonstrated many of the tools that will be useful for exploring the genomes of other environments, including the importance of linking genomic and proteomic analyses.

Metagenomic studies of more diverse environments have also led to important discoveries. Even though it is considered an oligotrophic environment, the metagenome of the Sargasso Sea was too large to link many organisms with their functions (Venter et al., 2004). Within the genomic data that was collected, however, the researchers were able to estimate microbial diversity, find genes for known and new functions, and identify the functions of some of the new genes based on similarity to other organisms (Venter et al., 2004). They sequenced 109 bases, and found 1.2 million new genes, double the number of all previously identified genes. They found 794061 proteins with no known function, 69718 proteins involved with energy transduction, including 782 rhodopsin-like photoreceptors (Venter et al., 2004), demonstrating the importance of primary production in this environment. The amount of data generated in metagenomics studies is daunting, but large data sets add to our knowledge of microbial diversity and community structure and create a powerful database of information, even if most of the genes within that database cannot be identified.

Comparative metagenomics involves comparing metagenomic libraries from different habitats to assess the importance of functional diversity without linking the functions to specific organisms (Schloss and Handelsman, 2005a; Tringe et al., 2005). This strategy was used to compare functional genes from a Minnesota agricultural soil, a marine habitat with whale carcasses, and the Sargasso Sea metagenomic survey (Tringe et al., 2005). Each environment has obvious characteristics that differentiated it from the others; the soil contained abundant C from plant debris, in the marine whale-fall environment the main C source was animal based due to the decomposing whale carcasses, and the Sargasso Sea was supported by photosynthetic bacterioplankton. Not surprisingly, the metagenome libraries from each environment had site-specific gene patterns that corresponded well with the activities expected in those habitats. The soil metagenome had genes for decomposing cellulose from plant material while the Sargasso Sea had a large diversity of bacteriorhodopsins. Where the marine habitats had genes for Na exporters, the soil had more genes for K channels because a dominant cation in the soil was K. Other environmental parameters like energy production and population density were consistent with the types of genes found in each habitat (Tringe et al., 2005). These researchers demonstrated how metagenomic libraries could be used to generate high-resolution environmental fingerprints, to compare distinct habitat conditions, and to predict energy sources in an environment.

A further application of genomic information is to analyze environmental genomes with microarrays (Cho and Tiedje, 2001; Torsvik and Øvreås, 2002; Zhou, 2003; Xu, 2006). Microarrays are microscopic chips that contain hundreds to thousands of individual gene sequences. The microarray is hybridized to the genomic or metagenomic material and a fluorescent signal is produced when a match occurs. The technology for microarray use is still developing (Xu, 2006) but the potential applications make it a promising tool for understanding microbial communities. Analysis of soil metagenomes with microarrays has included microarrays constructed of 16S rRNA sequences (Neufeld et al., 2006), functional genes (Wu et al., 2001; Rhee et al., 2004), and whole-community genomes (Wu et al., 2004). An example of a practical application of microarray technology is the detection of the biodegradative potential of a soil environment. Rhee et al. (2004) hybridized the genome from a soil microcosm to an array consisting of 1662, 50-bp, probes they developed from the genes in the University of Minnesota Biocatalyst/Biodegradation Database (umbbd.msi.umn.edu/ [verified 19 Dec. 2006]). They were able to detect a change in the microbial community when the microcosms were incubated with naphthalene and could identify the naphthalene-degrading genes that were used in the microcosm. Microarrays could also be developed to detect specific groups of organisms, monitor gene expression or distribution, detect mutations, monitor populations shifts, or compare one microbial community with another (Cho and Tiedje, 2001; Torsvik and Øvreås, 2002; Sebat et al., 2003; Zhou, 2003; Xu, 2006).

Soil metagenomic studies have been especially useful for identifying genes from uncultivated microbes like the Acidobacterium (Liles et al., 2003; Quaiser et al., 2003) and the Archaea (Quaiser et al., 2002; Treusch et al., 2004). Based on 16S rRNA gene surveys of soil, Acidobacteria routinely account for 20 to 30% of the soil diversity; however, there are only a few cultured representatives of this phyla (Liles et al., 2003; Galperin, 2004; Handelsman, 2004; Davis et al., 2005). Liles et al. (2003) created a large bacterial artificial chromosome (BAC) clone metagenomic library from an agricultural soil and screened the library for Acidobacterium 16S rRNA gene inserts. On one of these clones they also identified 20 ORFs related to genes for cell cycling, cell division, DNA excision repair, folic acid biosynthesis, and ABC transporters for amino acids. A similar strategy successfully linked physiological genes with Archaeal rRNA sequences from a soil metagenome library (Quaiser et al., 2002; Treusch et al., 2004). Even though only about 1% of the soil community was Archaea, these researchers identified three clones from nonthermophilic Crenarchaeota by screening the library for Archaeal genes. Although they could not identify the organisms that were detected, they were able to determine that these soil organisms were significantly distinct from marine Crenarchaeota (Treusch et al., 2004). Most of the genes identified with these terrestrial Crenarchaeota were for housekeeping functions or had unknown functions, but were unique to this group of organisms. Metagenomics is proving useful for understanding the physiology and environmental role of uncultivated microorganisms and may provide some new strategies for culturing these organisms.

Functional Metagenomics
Soil is considered one of the most challenging and potentially rewarding environments for metagenomics studies. Because the microbial diversity of soil is immense (Gans et al., 2005) and the vast majority of that diversity remains uncultured, the soil metagenome may be too large to sequence completely in the near future. It is often assumed, however, that this diversity contains vast untapped resources of microbial processes that may have significant scientific, practical, or profitable potential. Companies like Diversa Corporation (San Diego, CA) specifically mine microbial genomes from unique environments for economically viable enzymes, biochemical pathways, agricultural products, new pharmaceuticals, and other products (Handelsman, 2004; Keller and Zengler, 2004; Schloss and Handelsman, 2005a).

Large fragments of DNA were first retrieved from soil for metagenomics by Rondon et al. (2000). They created a clone library of 3648 clones from approximately 100 Mbp of DNA. The average insert size was 27 kb. Instead of sequencing all of this data, which would have been a huge expense in time and money, these researchers screened the clones for enzyme activities. Twelve BAC clones in this library expressed DNAase, lipase, amalase, hemolytic enzyme activity, or antibacterial activity in an E. coli host. Their technique demonstrated that the soil metagenome could be a source for novel microbial products without cultivating or identifying the organisms responsible. To find these processes, however, they needed to look for specific functions with selective media and they needed to screen many clones (Rondon et al., 2000).

Deciding what activities to look for and how many clones to screen are significant decisions that may depend on several factors including the size of the fragment inserts and the activities of interest. Advances in the availability of vectors (Voget et al., 2003; Martinez et al., 2004) and host strains (Courtois et al., 2003; Martinez et al., 2004; Williamson et al., 2005) are increasing the potential for finding novel bioproducts and catalytic enzymes from soil metagenomes (Table 2). Other promising applications that may come from soil metagenomic libraries are antitumor agents (Pettit, 2004) and novel biodegradative pathways for xenobiotics (Eyers et al., 2004).


View this table:
[in this window]
[in a new window]

 
Table 2. Examples of novel biocatalysts and bioproducts that have been detected in soil metagenomic libraries.

 
The future of soil metagenomics research will involve combining techniques that can link microorganisms and their functions or that can monitor these functions in the environment. Techniques that are used to link microbial community structure and function by either labeling or quantifying target genes, like real-time PCR, stable isotope probing, or fluorescent in situ hybridization, will be facilitated by the new functional or phylogenic genetic targets obtained from metagenomic libraries (Wellington et al., 2003). Combining these technologies will help address many of the elusive questions in soil microbiology including determining the role of lateral gene transfer in microbial evolution, monitoring the dynamic interactions between environmental influences and microbial community changes, finding patterns in microbial processes, understanding the importance of species diversity and functional redundancy in soil, and predicting biogeochemical cycles (Torsvik and Øvreås, 2002; Wellington et al., 2003; DeLong, 2004; Handelsman, 2004; Streit and Schmitz, 2004).


    CHALLENGES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 
Although technically feasible and potentially invaluable, large sequencing studies still face significant challenges. Foremost among the challenges will be evaluating the tremendous amounts of information that will be generated (Nelson, 2003). Identifying the functions of genes in a well-studied and easily cultivated microorganism is relatively easy compared with the daunting task of understanding the genomics of uncultivated microorganisms or whole environmental genomes. Even with 150000 sequence reads, only 1% of a soil metagenome could be assembled into contiguous sequences (Schloss and Handelsman, 2005a; Tringe et al., 2005). Tringe et al. (2005) estimated 2 to 5 billion bp of sequence would be needed to completely cover the metagenome of a Minnesota soil. In mixed microbial communities, it will be difficult to separate microbial genomes, view microheterogeneity within samples, and organize, assemble, and annotate the genomes (Nelson, 2003; Galperin, 2004). Even relatively low-biomass microbial communities contain vast amounts of genetic information that has never been seen before (Tyson et al., 2004; Venter et al., 2004; Schloss and Handelsman, 2005a) and thus cannot be identified based on known genetic or protein data. Most of the ORFs that have been found in metagenomic studies have no homologous representatives in the available databases (Schloss and Handelsman, 2005a). In addition, incorrect assembly of contiguous genomes and the formation of chimeric inserts can create problems in interpreting the data (DeLong, 2004; Schloss and Handelsman, 2005a).

Metagenomics studies, like all DNA-based microbial community analyses, face significant biases. Metagenomics produces a snapshot of the microbial community genome at a specific point in time and space (Schloss and Handelsman, 2005a). Because they are based on DNA extracted from the environment, metagenomic studies alone cannot describe how or when the genes found are expressed; analyses of the proteomics or specific assays for microbial activities are also needed. The most abundant DNA recovered from any environment will probably come from the most abundant or most readily lysed cells, not necessarily the most environmentally important or interesting organisms. Because large fragments of DNA are required for some metagenomic studies, the lysis procedures used are relatively gentle and may introduce biases against lysis-resistant organisms in the genome collection. Streptomycetes, a group of microorganisms from soil that are likely to produce bioactive molecules, may not lyse under detergent-based protocols. Thus these organisms, although abundant and important, may be largely excluded from soil metagenomic libraries. An additional technical challenge for bioprospecting studies is the limited number of vector and host combinations for expressing environmental genes (Pettit, 2004). Genes extracted from the environment may not express in common hosts like E. coli. Eukaryotic host strains may be better suited for finding genes of pharmaceutical interest, but may also be more difficult to manipulate in the lab. Because most of these technical challenges are difficult to avoid with the current technology, they should be recognized as confounding factors in metagenomic studies.

A final challenge for microbial ecologists is the increasing need for interdisciplinary trained researchers, support opportunities for multidisciplinary collaborations, and shared facilities and equipment for conducting collaborative genomic research (DeLong, 2004). Educators at undergraduate and graduate institutions should be developing cross-disciplinary programs that emphasize the core concepts of biology, chemistry, and ecology and introduce students to bioinformatics. A true understanding of soil microbiology will require a systems-based approach (Buckley, 2005) that involves uncovering the interactions between genes, proteins, small molecules, organelles, and environmental factors, and will require researchers with experience in many fields (DeLong, 2004; Buckley, 2005).


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

Received for publication March 21, 2006.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 APPLICATIONS OF THE 16S...
 WHOLE GENOME SEQUENCING AND...
 CHALLENGES
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zwolinski, M. D.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Zwolinski, M. D.
Agricola
Right arrow Articles by Zwolinski, M. D.
Related Collections
Right arrow Soil Microbiology


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Agronomy Journal Crop Science
Journal of Natural Resources
and Life Sciences Education
Vadose Zone Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome