 
          3080
        
        
          Proceedings of the 18
        
        
          th
        
        
          International Conference on Soil Mechanics and Geotechnical Engineering, Paris 2013
        
        
          6 GENE SEQUENCE DATA ANALYSIS
        
        
          Sequence data is usually provided in a text file in FASTA
        
        
          format, where there a description line and then the sequence of
        
        
          nucleotides reported as single-letter codes (A,G,C,T). In a
        
        
          Geoenvironmental context, the purpose of sequencing a gene is
        
        
          usually to identify the species from which the sequence came.
        
        
          This is done by comparison with open-access databases such as
        
        
          GenBank (
        
        
        
          ), the EMBL
        
        
          nucleotide sequence database (
        
        
        
          /), or
        
        
          the DNA Data Bank of Japan 
        
        
        
          /).
        
        
          These databases are maintained by public bodies in the USA,
        
        
          Europe and Japan collaborating as the International Nucleotide
        
        
          Sequence Database Collaboration 
        
        
        
          /).
        
        
          Sequences obtained from samples can be compared with
        
        
          sequences in the database using a variety of free, public domain
        
        
          software. BLAST (Basic Local Alignment Search Tool) makes
        
        
          pair-wise comparisons with sequences in the chosen database
        
        
          and reports the statistically most significant matches.
        
        
          SEQMATCH available from the Ribosomal Database Project
        
        
          (
        
        
        
           performs a similar function,
        
        
          and readily allows the user to restrict the quality of sequences to
        
        
          which matches are reported (e.g. type species, isolates, long
        
        
          read lengths, “good” quality).
        
        
          CLASSIFIER, which is also available from the Ribosomal
        
        
          Database Project, is a naïve Bayesian Classifier that can place
        
        
          bacterial 16S rRNA sequences within Bergey’s Taxonomic
        
        
          Outline of the Prokaryotes (Wang et al. 2007). It is easy to use,
        
        
          and can be used for classifying single rRNA gene sequences or
        
        
          for the analysis of libraries of thousands of sequences.
        
        
          For some types of analysis it may be necessary to align
        
        
          sequences from the same gene of different species prior to
        
        
          detailed analysis. An alignment is a way of arranging gene
        
        
          sequences to identify regions of similarity that indicate
        
        
          functional, structural, or evolutionary relationships between the
        
        
          sequences (Mount, 2004). There is a variety of open-access
        
        
          software available for aligning gene sequences, two of the more
        
        
          popular of which are ClustalW (Cluster Analysis) and
        
        
          MUSCLE (MUltiple Sequence Comparison by Log-
        
        
          Expectation) both of which are available from the European
        
        
          Bioinformatics Institute website (amongst other sources).
        
        
          Phylogentic relationships between the aligned sequences can be
        
        
          displayed as phylogenetic trees using software such as
        
        
          TreeView (
        
        
        
          Page, 1996), or organised into “operational taxonomic units”
        
        
          (OTUs) using software such as MOTHUR (
        
        
        
          .
        
        
          mothur.org/; Schloss et al., 2009). In this context an OTU is a
        
        
          grouping defined by sequence similarity, which can be set by
        
        
          the user to correspond roughly with phylum, class, order,
        
        
          family, genus, species, as appropriate. Rarefaction analysis
        
        
          (which can also be undertaken by MOTHUR) can characterize
        
        
          the diversity of a clone library using either rarefaction curves or
        
        
          a numerical indicator such as the Shannon Index (Krebs, 1999).
        
        
          Next generation sequencing can produce 2-3 orders of
        
        
          magnitude more data than traditional approaches based on
        
        
          cloning and sequencing. Thus, while the basic stages in analysis
        
        
          are similar to the traditional approach, the task of applying it to
        
        
          many thousands of sequences in parallel usually requires the use
        
        
          of different software. The RDP project (described above) has a
        
        
          pyrosequencing pipeline that “processes and converts the data to
        
        
          formats suitable for common ecological and statistical
        
        
          packages”. Similarly, QIIME (Quantitative Insights Into
        
        
          Microbial Ecology) is an open source software package for
        
        
          analysing high-throughput amplicon sequencing data, such as
        
        
          16S rRNA gene sequences (
        
        
        
          ).
        
        
          7 DISCUSSION AND CONCLUSIONS
        
        
          Microbes can be expected to impact most if not all processes
        
        
          occurring in the geo-environment, and geotechnical engineers
        
        
          should be aware of the potential for harnessing microbial
        
        
          metabolism to bring about desired aims. PCR based
        
        
          methodologies permit the detection of the microbes present and
        
        
          how they change with changing conditions. PCR is relatively
        
        
          easy to use in an engineering setting and the availability of
        
        
          reagents in kit form along (with detailed protocols) means that
        
        
          the barriers to adoption are reasonably low. However this is a
        
        
          rapidly moving field and the advent of high throughput deep
        
        
          sequencing technologies have led to the development of
        
        
          ‘metagenomics’ and ‘metatranscriptomics’ which investigates
        
        
          the composite genetic potential of an ecological niche.
        
        
          Instrumentation and cost of sample analysis are still relatively
        
        
          high but likely to fall as capacity and technology increase. The
        
        
          sheer volume of data generated poses a significant challenge in
        
        
          terms of bioinformatics and fully exploiting these technologies
        
        
          will require multidisciplinary collaborations between engineers,
        
        
          molecular biologists and informaticians.
        
        
          8 REFERENCES
        
        
          Acinas S.G. et al. 2004. Fine-scale phylogenetic architecture of a
        
        
          complex bacterial community. Nature 430(6999), 551-554
        
        
          Acinas, S. G. et al. 2005. "PCR-Induced Sequence Artifacts and Bias:
        
        
          Insights from Comparison of Two 16S rRNA Clone Libraries
        
        
          Constructed from the Same Sample." Appl. Environ. Microbiol.
        
        
          71(12): 8966-8969.
        
        
          Borneman, J. & Triplett, E.W. 1997. Molecular microbial diversity in
        
        
          soils from eastern Amazonia: evidence for unusual microorganisms
        
        
          and microbial population shifts associated with deforestation. Appl.
        
        
          Environ. Microbiol. 63:2647-2653
        
        
          Burke, I.T. et al 2012. Biogeochemical reduction processes in a hyper-
        
        
          alkaline affected leachate soil profile. Geomicrobiology Journal 29
        
        
          (9), 769–779.
        
        
          Cardinale, M. et al. 2004 Comparison of different primer sets for use in
        
        
          automated ribosomal intergenic spacer analysis of complex
        
        
          bacterial communities. Appl. Environ. Microbiol. 70, 6147-6156.
        
        
          Krebs, C.J. 1999. Ecological Methodology. Addison-Welsey
        
        
          Educational Publishers Inc, Menlo Park, CA.
        
        
          Islam, F.S.. et al. 2004. Role of metal-reducing bacteria in arsenic
        
        
          release from Bengal delta sediments. Nature, 430, 6995, 68-71.
        
        
          Marchetti, A., et al. 2012 Comparative metatranscriptomics identifies
        
        
          molecular bases for the physiological responses of phytoplankton to
        
        
          varying iron availability. PNAS
        
        
        
          /pnas.1118408109
        
        
          Metzker, M.L. 2010 Sequencing technologies the next generation.
        
        
          Nature Reviews Genetics 11, 31-46.
        
        
          Mount, D.W. 2004. Bioinformatics: Sequence and Genome Analysis.
        
        
          Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.
        
        
          Polz, M. F. and C. M. Cavanaugh 1998. "Bias in Template-to-Product
        
        
          Ratios in Multitemplate PCR." Appl. Environ. Microbiol. 64(10):
        
        
          3724-3730.
        
        
          Promega
        
        
          2012.
        
        
          GoTaq®
        
        
          DNA
        
        
          Polymerase
        
        
          Protocol.
        
        
        
          /. Last accessed 4
        
        
          th
        
        
          December 2012
        
        
          Qiu, X., Wu, L. et al. (2001). "Evaluation of PCR-Generated Chimeras,
        
        
          Mutations, and Heteroduplexes with 16S rRNA Gene-Based
        
        
          Cloning." Appl. Environ. Microbiol. 67(2): 880-887.
        
        
          Roche 2011a. FastStart Taq DNA Polymerase dNTPpack: Version 7.
        
        
        
          . Last accessed 4th December 2012.
        
        
          Roche 2011b. 454 Sequencing System Guidelines for Amplicon
        
        
          Experimental Design. 
        
        
        
          . Last accessed 10-12-12.
        
        
          Sunar, N.M. et al. 2009. Enumeration of salmonella in compost material
        
        
          by a non-culture based method. Sardinia 2009: 12
        
        
          th
        
        
          Int. Waste
        
        
          Management and Landfill Symp., 1005-1006.
        
        
          Page, R.D.M. 1996. TREEVIEW: An application to display
        
        
          phylogenetic trees on personal computers. Computer Applications
        
        
          in the Biosciences 12: 357-358.
        
        
          Wang, Q. et al. 2007. Naive Bayesian classifier for rapid assignment of
        
        
          rRNA sequences into the new bacterial taxonomy, Appl. Environ.
        
        
          Microbiol. 73 5261–5267.
        
        
          Wang, Z. et al. 2009. RNA-seq a revolutionary tool for transcriptomics.
        
        
          Nature Review Genetics 10 57-63