User:Suspencewl/DNA Nanoball Sequencing

From Wikipedia, the free encyclopedia

DNA nanoball sequencing is a type of high throughput sequencing technology used to determine the entire genomic sequence of an organism. The company Complete Genomics uses this technology to sequence samples that researchers submit from several projects. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent probes bind to complementary DNA and the probes are ligated to anchor sequences bound to known sequences on the DNA template. The fluorescent color is analyzed and the base is determined[1].

Workflow for DNA nanoball sequencing
Workflow for DNA nanoball sequencing

This method of DNA sequencing allows large numbers of DNA nanoballs to be sequenced per run and at low reagent costs compared to other next generation sequencing platforms[2]. However, only short sequences of DNA are determined from each DNA nanoball which makes mapping the short reads to a reference genome difficult[3]. This technology has been used for multiple genome sequencing projects and is scheduled to be used for many more[4].

Procedure[edit]

DNA Nanoball Sequencing involves isolating DNA that is to be sequenced, shearing it into small 400 – 500 base pair (bp) fragments, ligating adapter sequences to the fragments and circularizing the fragments. The circular fragments are copied by rolling circle replication resulting in many single stranded copies of each fragment attached head to tail in a long strand, and compacted into a DNA nanoball. The nanoballs are then adsorbed onto a sequencing flowcell. Unchained sequencing reactions interrogate specific nucleotide locations of the nanoball by ligating fluorescent probes to the DNA. The color of the fluorescence at each interrogated position is visualized by a high resolution camera. Bioinformatics are used to analyze the fluorescence, make a base call, and map the 35 bp mate pair reads to a reference genome. The genome is assembled and polymorphisms are called[1].

DNA Isolation, Fragmentation, and Size Capture[edit]

Figure 2. Polyacrylamide gel electrophoresis (PAGE) gel of a sonicated DNA smear (left) and size selection by gel extraction (right).

Cells from the tissue to be sequenced are lysed and a protease is added to stop degradation of the DNA. The DNA is then extracted from the cell lysate, usually with one of several kits available for DNA extraction. The DNA, often 100’s of millions of base pairs long, is sonicated which exposes the DNA to intense sound waves that cause it to break in random positions. The lengths are visualized by polyacrylamide gel electrophoresis (PAGE). Since the sonciation procedure breaks the DNA in random positions, a range of DNA lengths is seen on the PAGE gel as a smear. Bioinformatic mapping of the sequencing reads is most efficient when the sample DNA contains a narrow length range[5]. Therefore, the correct length of DNA must be cut out of the PAGE gel and gel extraction must be performed with a PAGE gel extraction kit. Once these steps are completed, the DNA sample is pure and the length is within a narrow range (typically 400 – 500 base pairs)[3].

Attaching Adapter Sequences[edit]

DNA Nanoball Adapter Ligation

Adapters must be attached to the DNA so that areas of known sequences flank the unknown DNA sequences. These known sequences will be hybridized to anchors for the sequencing reactions. In the first round of adapter ligation, a right and left adapter (Ad1) is attached to the right and left flanks of the fragmented DNA (figure) and DNA with both adapters bound is PCR amplified. These adapter sequences are then modified so that they have complementary single strand ends that bind to each other forming circular DNA. The DNA is then methylated to protect it from the type IIS restriction enzyme to be used in the next step. One restriction recognition site in the right adapter is left non-methylated and so it is recognized by the AcuI restriction enzyme and the DNA is cleaved 13 bp to the right of the right adapter, forming linear double stranded DNA. A second round of right and left adapter sequences (Ad2) are ligated onto either end of the linear DNA and all DNA with both adapters bound is (PCR) amplified. The Ad2 sequences are modified to allow them to bind each other and form circular DNA. The DNA is then methylated and a restriction enzyme recognition site remains non-methylated on the left Ad1 adapter. The AcuI restriction enzyme is applied again and the DNA is cleaved 13 bp to the left of the Ad1, resulting in a linear DNA fragment. A third round of right and left adapter sequences (Ad3) are ligated to the right and left flank of the linear DNA and PCR amplified. The adapters are modified so that they will bind to each other and form circular DNA. The type III restriction enzyme EcoP15 is added which cleaves the DNA 26 bp to the left of Ad3 and 26 bp to the right of Ad2[6]. This removes a large segment of DNA and linearizes the DNA once again. A fourth round of right and left adapters (Ad4) are ligated to the DNA, PCR amplified, and modified so that they bind each other and form the completed circular DNA template[3].

Rolling Circle Replication[edit]

Rolling Circle Replication To Create DNA Nanoball

Once a completed circular template containing sample DNA separated with 4 unique adapter sequences has been generated, the full sequence must be amplified into a long string of DNA for the subsequent sequencing reaction. This is accomplished by rolling circle replication. The Phi 29 DNA polymerase is added to the DNA solution. It binds onto the DNA template and replicates the DNA. The newly synthesized strand peels off of the circular template resulting in a long single stranded DNA that is several head to tail copies of the circular template[7]. The four adapter sequences contain palindromic sequences which hybridize and cause the single strand to fold onto itself resulting in a tight ball of DNA approximately 300 nanometers (nm) across. This allows the nanoballs to remain separated from each other and reduces any tangling between different single stranded DNA lengths[3].

DNA Nanoball Microarray[edit]

DNA nanoball Array

The DNA nanoballs must be attached to a microarray flow cell for sequencing. The flow cell is a 25 mm by 75 mm silicon wafer coated with silicon dioxide, titanium, and hexamethyldisilazane (HMDS) and a photoresist material. 248 nm pits are etched through the photoresist and HMDS in an ordered array, and the pits are then coated with aminosilane. The DNA nanoballs are added to the flow cell and bind to the aminosilane[8]. The DNA nanoballs can only bind to the aminosaline and so attach to the flow cell in a highly ordered pattern which allows a very high density of DNA nanoballs to be sequenced[3].

Unchained Sequencing by Ligation[edit]

Unchained Ligation Sequencing

Once DNA nanoballs are arrayed onto a flow cell, the identity of the nucleotide sequences between the adapter sequences must be determined. First, oligonucleotide anchor DNA that is complementary to either the right or left end of one of the adapters is added to the flow cell. Next, T4 DNA ligase is added to a pool of four 10-mer DNA sequences that have degenerate nucleotides in all but one position (for example position 1 next to the anchor, figure) and are added to the flow cell. The interrogative position in the DNA probe contains an "A" nucleotide with a red fluorophore attached, a "C" with a yellow fluorophore attached, a "G" with a green fluorophore attached or a "T" with a blue fluorophore attached. Only the probe that has a complementary nucleotide in the interrogative position will bind. The T4 DNA ligase attaches the probe to the anchor, the non-binding probes are washed away, and the fluorescence is measured. The probe/anchor is removed from the DNA nanoball and another anchor is added. A new pool of probes is added with a different interrogative position (for example position 2 next to the anchor, Figure #,N). The correct probe hybridizes, is ligated, rinsed and the fluorescence is read. This is repeated with all 10 interrogation positions next to an anchor sequence. Once all 10 positions are recorded, a different anchor is used that binds to a different adapter and the process is repeated to identify the 10 nucleotides next to that adapter. Eventually, 10 nucleotides to the left and to the right of each of the adapters are identified[3].

Imaging[edit]

After each DNA probe/ligation step, the flow cell is imaged to determine what nucleotide bound to each DNA nanoball. The fluorophore is excited with an arc lamp that radiates specific wavelengths of light towards the flow cell. This excites the fluorophore causing it to fluoresce. The color of the fluorescence of each DNA nanoball is captured on a high resolution CCD camera. The image is then processed to remove background signal and assess the intensity of each point. The color of each DNA nanoball corresponds to a base at the interrogative position and a computer records the base/position information. After 80 images (8 rounds of 10 interrogative positions per round), the identity of 70 nucleotides per DNA nanoball is ascertained (with some overlap)[3].

Genome Assembly[edit]

Figure . 35 bp mate paired reads (red) map to a reference genome (dark blue)

In generating the circular template, a large segment of the original 400 – 500 base pair fragment was replaced with the adapter Ad4. The 70 bp that are sequenced are therefore the first 35 bp of the original 400 – 500 bp fragment, and the last 35 bp of the 400 – 500 bp fragment. Therefore, the sequence is identified for two 35 bp reads of DNA separated by about 330 – 430 bp. These 35 bp reads are compared, using bioinformatics, to a reference genome and assigned to a genetic locus[9]. The genome mapping is termed massively parallel since each of the reference nucleotide positions are covered by many sequencing reads. In this manner, the complete genome of the sample tested is assembled. Any single base pair discrepancy between the sequenced reads and the reference is noted as a possible single nucleotide polymorphism (SNP). Also after one 35 bp portion of the mate pair is mapped, the computer algorithm calculates where the mated 35 bp read should map. Discrepancy between the mate pairs can identify DNA inserts and deletions (indel)[3].

Advantages[edit]

DNA Nanoball sequencing flow cell (top) has high density of sequencing reads with most positions occupied compared to other next generation sequencing platforms (bottom)

DNA nanoball sequencing technology offers several advantages over other sequencing platforms. One of the most important advantages is the ability to load the array to a very high density. Since the array is designed to only allow one DNA nanoball to attach to each pit, and the pits are in an ordered array, a higher concentration of DNA can be added. This allows a high percentage of the pits to be occupied by a DNA nanoball thus maximizing the number of reads per flow cell (Figure Top)[1] compared to other sequencing arrays where molecules of DNA are added to a flow cell in a random orientation (Figure bottom)[10]. Another important advantage offered by DNA nanoball sequencing is that the sequencing reactions are non-progressive since after each reading of the probe, the probe and anchor are removed and a new anchor and probe set are added. This means that if a probe did not bind in the previous reaction, it would have no detrimental effect on the next probe ligation[2]. With most other next generation sequencing platforms, the sequencing reactions are progressive meaning that each new probe that is bound to the template will only be read correctly if the previous probe attached correctly. This is a major source of reading error in other next generation sequencing platforms[2]. It also means that other next generation sequencing platforms must apply large amounts of the expensive probes to the reaction to ensure that each chain is extended by one probe. Since DNA nanoball sequencing does not necessitate the probe ligation reaction to be run to completion, less of the expensive probes need to be added thus reducing the cost[2] Other advantages of DNA nanoball sequencing include the use of high fidelity Phi 29 DNA polymerase[7] to ensure accurate amplification of the circular template, several hundred copies of the circular template compacted into a small area resulting in an intense signal, and the fluorophore is attached to the probe at a long distance from the ligation point resulting in better ligation[1].

Disadvantages[edit]

The main disadvantage of DNA nanoball sequencing is the short read length that is identified from each nanoball[1]. The short reads can mean that a read will map to two or more regions of the reference genome. This is a major issue with several of the next generation sequencing platforms and is only partially resolved by using paired end tags of differing lengths. This is a problem particularly in highly repetitive regions of the reference genome. A second disadvantage of this method is that multiple rounds of PCR are used. This can introduce PCR bias and possibly amplify contaminants in the template construction phase[1].

Applications[edit]

DNA nanoball sequencing has been used in recent publications. Lee et. al. used this technology to find mutations that were present in a lung cancer and compared them to the normal lung tissue from the same patient[11]. They were able to identify over 50,000 single nucleotide variants. In another paper that utilized this technology, Roach et. al. sequenced a family of four and were able to identify SNPs that may be responsible for a Mendelian disorder[12]. They were also able to estimate the inter-generation mutation rate[12]. In addition, The Institute for Systems Biology has started to use this technology to sequence 615 complete human genome samples as part of a neurodegenerative study, and the National Cancer Institute is using DNA nanoball sequencing to sequence 50 tumours and matched normal from pediatric cancers[13].

Significance[edit]

Massively parallel next generation sequencing platforms like DNA nanoball sequencing are revolutionizing the treatment and diagnosis of many diseases. The cost of sequencing an entire human genome has fallen from about one million dollars in 2008, to only $4400 dollars in 2010 with the DNA nanoball technology[14]. By sequencing the entire genome of patients with inherited diseases or cancer, the specific mutations that cause many of these diseases have been identified. Once these genetic mutations are known, targeted therapeutics may be designed, populations can be screened to identify at risk people, and genetic counseling can be prescribed[14]. As the price of sequencing an entire human genome approaches the $1000 mark, genomic sequencing the entire population may become feasible as part of normal preventative medicine[14].

Notes[edit]

  1. ^ a b c d e f Human Genome Sequencing Using Unchained Base Reads in Self-Assembling DNA Nanoarrays. Drmanac, R. et. al. Science, 2010, 327 (5961): 78-81,
  2. ^ a b c d Genome Sequencing on Nanoballs Porreca, JG. Nature Biotechnology, 2010, 28:(43-44)
  3. ^ a b c d e f g h Human Genome Sequencing Using Unchained Base Reaads in Self-Assembling DNA Nanoarrays, Supplementary Material. Drmanac, R. et. al. Science, 2010, 327 (5961):78-81,
  4. ^ Complete Genomics Press release, 2010
  5. ^ Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analysis Fullwood, MJ. et. al. Genome Research, 2009 Apr;19(4):521-32
  6. ^ Nucleoside triphosphate-dependent restriction enzymes Dryden, DTF. et. al. Nucleic Acids Research, 2001, 29 18:3728-3741
  7. ^ a b Highly Efficient DNA Synthesis by the Phage Phi29 DNA Polymerase Blanco. L. et. al. The Journal of Biological Chemistry, 1989, 264(15):8935-8940,
  8. ^ Covalent Attachment of Synthetic DNA to Self-Assembled Monolayer Films Chrisey, LA. et. al. Nucleic Acids Research, 1996, 24(15) 3031-3039
  9. ^ State of the Art De Novo Assembly of Human Genomes from Massively Parallel Sequencing Data, li, Y. et. al. Human Genomics, 2010, 4(4):271-277.
  10. ^ Accurate whole Human Genome Sequencing Using reversible Terminator Chemistry Bentley TR. et. al. Nature, 2008. 456(7218): 53–59.
  11. ^ The Mutation Spectrum Revealed by Paired Genome Sequences From a Lung Cancer patient Lee, W. et. al. Nature, 2010, 465:(473-477)
  12. ^ a b Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing Roach, JC. et. al. Science, 2010, 328 5978:(636-639)
  13. ^ Complete Genomics Press release, 2010
  14. ^ a b c Effect of Genome-Wide Association Studies, Direct-to-Consumer Genetic Testing, and High-Speed Sequencing Technologies on Predictive Genetic Counselling for Cancer Risk Speicher, MR. et. al. The Lancet Oncology, 2011, 11(9):(890-898)

--Suspencewl (talk) 01:21, 20 February 2011 (UTC)