Actes du colloque

3080

Proceedings of the 18

th

International Conference on Soil Mechanics and Geotechnical Engineering, Paris 2013

6 GENE SEQUENCE DATA ANALYSIS

Sequence data is usually provided in a text file in FASTA

format, where there a description line and then the sequence of

nucleotides reported as single-letter codes (A,G,C,T). In a

Geoenvironmental context, the purpose of sequencing a gene is

usually to identify the species from which the sequence came.

This is done by comparison with open-access databases such as

GenBank (

http://www.ncbi.nlm.nih.gov/genbank/

), the EMBL

nucleotide sequence database (

http://www.ebi.ac.uk/embl

/), or

the DNA Data Bank of Japan

(http://www.ddbj.nig.ac.jp

/).

These databases are maintained by public bodies in the USA,

Europe and Japan collaborating as the International Nucleotide

Sequence Database Collaboration

(http://www.insdc.org

/).

Sequences obtained from samples can be compared with

sequences in the database using a variety of free, public domain

software. BLAST (Basic Local Alignment Search Tool) makes

pair-wise comparisons with sequences in the chosen database

and reports the statistically most significant matches.

SEQMATCH available from the Ribosomal Database Project

(

http://rdp.cme.msu.edu/index.jsp)

performs a similar function,

and readily allows the user to restrict the quality of sequences to

which matches are reported (e.g. type species, isolates, long

read lengths, “good” quality).

CLASSIFIER, which is also available from the Ribosomal

Database Project, is a naïve Bayesian Classifier that can place

bacterial 16S rRNA sequences within Bergey’s Taxonomic

Outline of the Prokaryotes (Wang et al. 2007). It is easy to use,

and can be used for classifying single rRNA gene sequences or

for the analysis of libraries of thousands of sequences.

For some types of analysis it may be necessary to align

sequences from the same gene of different species prior to

detailed analysis. An alignment is a way of arranging gene

sequences to identify regions of similarity that indicate

functional, structural, or evolutionary relationships between the

sequences (Mount, 2004). There is a variety of open-access

software available for aligning gene sequences, two of the more

popular of which are ClustalW (Cluster Analysis) and

MUSCLE (MUltiple Sequence Comparison by Log-

Expectation) both of which are available from the European

Bioinformatics Institute website (amongst other sources).

Phylogentic relationships between the aligned sequences can be

displayed as phylogenetic trees using software such as

TreeView (

http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

Page, 1996), or organised into “operational taxonomic units”

(OTUs) using software such as MOTHUR (

http://www

.

mothur.org/; Schloss et al., 2009). In this context an OTU is a

grouping defined by sequence similarity, which can be set by

the user to correspond roughly with phylum, class, order,

family, genus, species, as appropriate. Rarefaction analysis

(which can also be undertaken by MOTHUR) can characterize

the diversity of a clone library using either rarefaction curves or

a numerical indicator such as the Shannon Index (Krebs, 1999).

Next generation sequencing can produce 2-3 orders of

magnitude more data than traditional approaches based on

cloning and sequencing. Thus, while the basic stages in analysis

are similar to the traditional approach, the task of applying it to

many thousands of sequences in parallel usually requires the use

of different software. The RDP project (described above) has a

pyrosequencing pipeline that “processes and converts the data to

formats suitable for common ecological and statistical

packages”. Similarly, QIIME (Quantitative Insights Into

Microbial Ecology) is an open source software package for

analysing high-throughput amplicon sequencing data, such as

16S rRNA gene sequences (

http://qiime.org/

).

7 DISCUSSION AND CONCLUSIONS

Microbes can be expected to impact most if not all processes

occurring in the geo-environment, and geotechnical engineers

should be aware of the potential for harnessing microbial

metabolism to bring about desired aims. PCR based

methodologies permit the detection of the microbes present and

how they change with changing conditions. PCR is relatively

easy to use in an engineering setting and the availability of

reagents in kit form along (with detailed protocols) means that

the barriers to adoption are reasonably low. However this is a

rapidly moving field and the advent of high throughput deep

sequencing technologies have led to the development of

‘metagenomics’ and ‘metatranscriptomics’ which investigates

the composite genetic potential of an ecological niche.

Instrumentation and cost of sample analysis are still relatively

high but likely to fall as capacity and technology increase. The

sheer volume of data generated poses a significant challenge in

terms of bioinformatics and fully exploiting these technologies

will require multidisciplinary collaborations between engineers,

molecular biologists and informaticians.

8 REFERENCES

Acinas S.G. et al. 2004. Fine-scale phylogenetic architecture of a

complex bacterial community. Nature 430(6999), 551-554

Acinas, S. G. et al. 2005. "PCR-Induced Sequence Artifacts and Bias:

Insights from Comparison of Two 16S rRNA Clone Libraries

Constructed from the Same Sample." Appl. Environ. Microbiol.

71(12): 8966-8969.

Borneman, J. & Triplett, E.W. 1997. Molecular microbial diversity in

soils from eastern Amazonia: evidence for unusual microorganisms

and microbial population shifts associated with deforestation. Appl.

Environ. Microbiol. 63:2647-2653

Burke, I.T. et al 2012. Biogeochemical reduction processes in a hyper-

alkaline affected leachate soil profile. Geomicrobiology Journal 29

(9), 769–779.

Cardinale, M. et al. 2004 Comparison of different primer sets for use in

automated ribosomal intergenic spacer analysis of complex

bacterial communities. Appl. Environ. Microbiol. 70, 6147-6156.

Krebs, C.J. 1999. Ecological Methodology. Addison-Welsey

Educational Publishers Inc, Menlo Park, CA.

Islam, F.S.. et al. 2004. Role of metal-reducing bacteria in arsenic

release from Bengal delta sediments. Nature, 430, 6995, 68-71.

Marchetti, A., et al. 2012 Comparative metatranscriptomics identifies

molecular bases for the physiological responses of phytoplankton to

varying iron availability. PNAS

www.pnas.org/cgi/doi/10.1073

/pnas.1118408109

Metzker, M.L. 2010 Sequencing technologies the next generation.

Nature Reviews Genetics 11, 31-46.

Mount, D.W. 2004. Bioinformatics: Sequence and Genome Analysis.

Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.

Polz, M. F. and C. M. Cavanaugh 1998. "Bias in Template-to-Product

Ratios in Multitemplate PCR." Appl. Environ. Microbiol. 64(10):

3724-3730.

Promega

2012.

GoTaq®

DNA

Polymerase

Protocol.

http://www.promega.com

/. Last accessed 4

th

December 2012

Qiu, X., Wu, L. et al. (2001). "Evaluation of PCR-Generated Chimeras,

Mutations, and Heteroduplexes with 16S rRNA Gene-Based

Cloning." Appl. Environ. Microbiol. 67(2): 880-887.

Roche 2011a. FastStart Taq DNA Polymerase dNTPpack: Version 7.

https://cssportal.roche.com/

. Last accessed 4th December 2012.

Roche 2011b. 454 Sequencing System Guidelines for Amplicon

Experimental Design.

http://my454.com/

. Last accessed 10-12-12.

Sunar, N.M. et al. 2009. Enumeration of salmonella in compost material

by a non-culture based method. Sardinia 2009: 12

th

Int. Waste

Management and Landfill Symp., 1005-1006.

Page, R.D.M. 1996. TREEVIEW: An application to display

phylogenetic trees on personal computers. Computer Applications

in the Biosciences 12: 357-358.

Wang, Q. et al. 2007. Naive Bayesian classifier for rapid assignment of

rRNA sequences into the new bacterial taxonomy, Appl. Environ.

Microbiol. 73 5261–5267.

Wang, Z. et al. 2009. RNA-seq a revolutionary tool for transcriptomics.

Nature Review Genetics 10 57-63

Actes du colloque - Volume 4 - page 426

Warning.