Discovery of rare mutations in populations: TILLING by sequencing Helen Tsai et al, Plant Physiology.
Discovery of rare mutations in populations requires methods, such as TILLING, for processing and analyzing many individuals in parallel. Previous TILLING protocols employed enzymatic or physical discrimination of heteroduplexed from homoduplexed target DNA. Using mutant populations of rice (Oryza sativa) and wheat (Triticum durum and T. aestivum), we developed a method based on Illumina sequencing of target genes amplified from multi-dimensionally pooled templates representing 768 individuals per experiment. Parallel processing of sequencing libraries was aided by unique tracer sequences and barcodes allowing flexibility in the number and pooling arrangement of targeted genes, species, and pooling scheme. Sequencing reads were processed and aligned to the reference to identify possible single nucleotide changes, which were then evaluated for frequency, sequencing quality, intersection pattern in pools, and statistical relevance to produce a Bayesian score with an associated confidence threshold. Discovery was robust both in rice and wheat using either 2D or 3D pooling schemes. The method compared favorably to other molecular and computational approaches, providing high sensitivity and specificity. Reference: Discovery of rare mutations in populations: Tsai et al, TILLING by sequencing, Plant Physiology. For more information on TILLING at the UCD Genome Center go to the TILLING site
Statistical Mutation Calling from Sequenced Overlapping DNA Pools in TILLING Experiments Victor Missirian et al, BMC Bioinformatics.
Background: TILLING (Targeting induced local lesions IN genomes) is an efficient reverse genetics approach for detecting induced mutations in pools of individuals. Combined with the high-throughput of next-generation sequencing technologies, and the resolving power of overlapping pool design, TILLING provides an efficient and economical platform for functional genomics across thousands of organisms. Results: We propose a probabilistic method for calling TILLING-induced mutations, and their carriers, from high throughput sequencing data of overlapping population pools, where each individual occurs in two pools. We assign a probability score to each sequence position by applying Bayes’ Theorem to a simplified binomial model of sequencing error and expected mutations, taking into account the coverage level. We test the performance of our method on variable quality, high-throughput sequences from wheat and rice mutagenized populations. Conclusions: We show that our method effectively discovers mutations in large populations with sensitivity of 92.5% and specificity of 99.8%. It also outperforms existing SNP detection methods in detecting real mutations, especially at higher levels of coverage variability across sequenced pools, and in lower quality short reads sequence data.
Methods and Downloads
Wet bench protocols
- Library construction from amplicons. The method works for any DNA input (e.g. genomic DNA or ds-cDNA). It includes the use of barcoded adapters and the use of Ampure Sprybeads for size selection (No agarose gel!).
- Barcoded adapters. See our videos on barcoding here. We make available a list of tested 5-letter barcoded adapters. We order our oligonucleotides from Life Tech with “desalted” purity. Keep your adapter oligo master stocks frozen at -80 C and small working aliquots at 4 C. We found that the Illumina Hiseq does not like mixed libraries with 4 barcodes. We use at least 16 per lane. If you want to design your own adapters, we make available a program to generate the barcodes. Note that these barcoded adapters differ from those specified by Illumina. We also provide bioinformatics tools to process barcoded reads.
The TILLING pipeline
The TILLING pipeline consists of a series of scripts for TILLING analysis. The pipeline takes the following inputs:
- the raw Illumina sequence (sequence text files) from pooled amplicons (the target genes)
- a reference database, i.e. the FASTA sequence files (TILLING fragment, CDS, genomic) for each target gene
It filters the reads with general quality criteria, then aligns the sequence to the reference, evaluates changes as possible mutations and it outputs a table of putative mutations with connected effects on the target gene and a probability score. The pipeline combines publicly available programs such as BWA and our own CAMBA. It requires a UNIX workstation and good computational expertise to operate it. We regret that we cannot assist you in operating this pipeline beyond providing the ReadMe file with specifications and instructions. The pipeline was designed and implemented by Victor Missirian, Kathie Ngo and Meric Lieberman. Pipeline download.
Coverage Aware Mutation calling using Bayesian analysis (CAMBa, pronounced Samba) was written by Victor Missirian, Luca Comai and Vladimir Filkov. The publication describing it appeared in BMC Bioinformatics. If you downloaded the pipeline above, it includes CAMBa. If you want just the CAMBa program, it can be downloaded here.
The raw reads are available at this NCBI SRA site.
This work was supported by NSF Plant Genome award DBI-0822383, TRPGR: Efficient identification of induced mutations in crop species by ultra-high-throughput DNA sequencing