User:Nielsrca

From Wikipedia, the free encyclopedia

Past Contributions[edit]

Genome Engineering[edit]

CRISPR Cas9 Genome Engineering[edit]

Genome Editing utilizing the CRISPR-Cas9 system is carried out with a Type II CRISPR system. When utilized for genome editing this system includes Cas9, CRISPR RNA (crRNA), trans-activating crRNA (tracrRNA) along with an optional section of DNA repair template that is utilized in either Non-Homologous End Joining (NHEJ) or Homology Directed Repair (HDR).

graphical overview of CRISPR Cas9 plasmid construction[1][2]

Major Components of the CRISPR Cas9 System[edit]

Component Function
crRNA contains the RNA used by Cas9 to guide it to the correct section of host DNA along with a region that binds to tracrRNA (crRNA to tracrRNA binding generally occurs in a hairpin loop form) forming an active complex with Cas9
tracrRNA binds to crRNA and forms an active complex with Cas9
sgRNA a complex consisting of a tracrRNA and at least one crRNA
Cas9 protein that in its active form is able to modify DNA utilizing crRNA as its guide, many different variants of Cas9 exist with differing functionalities (i.e. single strand nicking, double strand break, DNA binding) due to Cas9's DNA site recognition function not being dependent on its two DNA cleaving domains (one for each strand)
repair template DNA that guides the cellular repair process allowing for the insertion of a specific DNA sequence

When using the CRISPR Cas9 system for genome engineering a plasmid is often created and used to transfect the cells that one wants to edit. The main components of this plasmid are displayed in the image and listed in the table above. The crRNA needs to be designed for each specific application as this is the sequence that Cas9 will use to directly bind to the cell's DNA and as a result needs to be specific and only bind where editing is desired. The repair template will also need to be designed for each application as it must overlap with the hanging ends and codes for the insertion sequence.

One or more crRNA's and the tracrRNA can be packaged together to form a single-guide RNA (sgRNA). This sgRNA can be joined together with the Cas9 gene and made into a plasmid in order to be transfected into cells (see image for overview).

overview of the transfection and DNA cleaving by CRISPR Cas9 (crRNA and tracrRNA are often joined as one strand of RNA when designing a plasmid)[3]

CRISPR Cas9 Structure[edit]

CRISPR Cas9 is a widely used system for genome editing due to its high degree of fidelity and relatively simple construction. CRISPR Cas9 depends primarily on two factors for its specificity – the CRISPR target sequence and the Protospacer Adjacent Motif (PAM). The CRISPR target sequence is 20 bases long and found as a part of each CRISPR locus in the crRNA array.[3] Typically a crRNA array will have multiple unique CRISPR target sequences. Cas9 proteins select the correct location on the host's genome by utilizing the CRISPR target sequence for base pair bonding with the host DNA. The CRISPR target sequence is not part of the Cas9 protein and as a result is customizable and can be independently synthesized.[4][5] On the other hand the PAM sequence on the host genome is recognized by the protein structure of Cas9 and generally cannot be easily modified to recognize a difference sequence. However this is not overly limiting as it is a short sequence and not very specific (eg. the SpCas9 PAM sequence is 5'-NGG-3' and in the human genome that is found roughly every 8 to 12 base pairs.[3]

Once these have been assembled into a plasmid and transfected into cells the Cas9 protein with help of the crRNA finds the correct sequence in the host cell's DNA and – depending on the Cas9 variant – creates a single or double strand break in the DNA. Properly spaced single strand breaks in the host DNA can trigger homology directed repair which is less error prone than non-homologous end joining that typically follows a double strand break. Providing a section of DNA repair template allows for the insertion of a specific DNA sequence at an exact location within the genome. The repair template should extend 40 to 90 base pairs beyond the Cas9 induced DNA break.[3] The goal is for the cell's HDR process to utilize the provided repair template and thereby incorporate the new sequence into the cell's genome. Once incorporated into the cell's genome this new sequence is now part of the cell's genetic material and will be found in it's daughter cells.

There are many online tools available to aid in designing effective sgRNA sequences (eg http://tools.genome-engineering.org) when designing a new CRISPR Cas9 plasmid.

Protein-Protein Interaction Networks[edit]

The protein protein interactions are displayed in a signed network that describes what type of interactions that are taking place [6]

Signed Networks[edit]

Protein-protein interactions are not monolithic events, that is, a protein is often modifying another protein in such a way that the resulting protein is often either ‘activated’ or ‘repressed’. While there are other options (proteins can form complexes with one another, physically degrade or modify another each other, transport a protein in or out of a compartment, etc.) the main protein-protein interactions that are commonly represented in network diagrams involve activation or repression.

Standard protein-protein interaction networks (directed or undirected) generally only indicate that two proteins are interacting and don't include more details about what type of interaction is occurring. Signed networks are one way that more useful information is being added to network diagrams. Signed networks are often expressed by labeling the interaction as either positive or negative. A positive interaction is one where the interaction results in one of the proteins being activated. Conversely a negative interaction results in one of the proteins being inactivated.[7]

Protein-Protein interaction networks are often constructed as a result of lab experiments such as yeast two hybrid screens and ‘affinity purification and subsequent mass spectrometry’ techniques.[8] However these methods do not provide the layer of information needed in order to determine what type of interaction is present in order to be able to attribute signs to the network diagrams.


RNA Interference Screens[edit]

RNA Interference (RNAi) screens (repression of individual proteins between transcription and translation) are one method that can be utilized in the process of providing signs to the protein-protein interactions. Individual proteins are repressed and the resulting phenotypes are analyzed. A correlating phenotypic relationship (ie where the inhibition of either of two proteins results in the same phenotype) indicates a positive, or activating relationship. Phenotypes that do no correlate (ie where the inhibition of either of two proteins results in two different phenotypes) indicate a negative or inactivating relationship. If protein A is dependent on protein B for activation then the inhibition of either protein A or B will result in a cell losing the service that is provided by protein A and the phenotypes will be the same for the inhibition of either A or B. If, however, protein A is inactivated by protein B then the phenotypes will differ depending on which protein is inhibited (inhibit protein B and it can no longer inactivate protein A leaving A active however inactivate A and there is nothing for B to activate since A is inactive and the phenotype changes). Multiple RNAi screens need to be performed in order to reliably appoint a sign to a given protein-protein interaction. Vinayagam et al. who devised this technique state that a minimum of nine RNAi screens are required with confidence increasing as one carries out more screens.[7]

Function prediction methods[edit]

...

Structure-based methods[edit]

...

Computational Solvent Mapping[edit]

Computational solvent mapping of AMA1 protein using fragment-based computational solvent mapping (FTMAP) by computationally scanning the surface of AMA1 with 16 probes (small organic molecules) and defining the locations where the probes cluster (marked as colorful regions on the protein surface)[9]

One of the challenges involved in protein function prediction is discovery of the active site. This is complicated by certain active sites not being formed - essentially existing - until the protein undergoes conformational changes brought on by the binding of small molecules. Most protein structures have been determined by X-ray crystallography which requires a purified protein crystal. As a result existing structural models are generally of a purified protein and as such lack the conformational changes that are created when the protein interacts with small molecules.[10]

Computational Solvent Mapping utilizes probes (small organic molecules) that are computationally ‘moved’ over the surface of the protein searching for sites where they tend to cluster. Multiple different probes are generally applied with the goal being to obtain a large number of different protein-probe conformations. The generated clusters are then ranked based on the cluster’s average free energy. After computationally mapping multiple probes, the site of the protein where relatively large numbers of clusters form typically corresponds to an active site on the protein.[10]

This technique is a computational adaptation of ‘wet lab’ work from 1996. It was discovered that ascertaining the structure of a protein while it is suspended in different solvents and then superimposing those structures on one another produces data where the organic solvent molecules (that the proteins where suspended in) typically cluster at the protein’s active site. This work was carried out as a response to realizing that water molecules are visible in the electron density maps produced by X-ray crystallography. The water molecules are interacting with the protein and tend to cluster at the protein's polar regions. This led to the idea of immersing the purified protein crystal in other solvents (e.g. ethanol, isopropanol, etc.) to determine where these molecules cluster on the protein. The solvents can be chosen based on what they approximate, that is, what molecule this protein may interact with (e.g ethanol can probe for interactions with the amino acid serine, isopropanol a probe for threonine, etc.). It is vital that the protein crystal maintains its tertiary structure in each solvent. This process is repeated for multiple solvents and then this data can be used to try and determine potential active sites on the protein.[11] Ten years later this technique was developed into an algorithm by Clodfelter et al.

Protein Sequence Analysis: Ensembles[edit]

Schematic view of the two main ensemble modeling approaches.[12]

Proteins are often thought of as relatively stable structures that have a set tertiary structure and experience conformational changes as a result of being modified by other proteins or as part of enzymatic activity. However proteins have varying degrees of stability and some of the less stable variants are intrinsically disordered proteins. These proteins exist and function in a relatively 'disordered' state lacking a stable tertiary structure. As a result they are difficult to describe in a standard protein structure model that was designed for proteins with a fixed tertiary structure. Conformational ensembles have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state of intrinsically disordered proteins. Conformational ensembles function by attempting to represent the various conformations of intrinsically disordered proteins within an ensemble file (the type found at the Ensemble Database).

Protein ensemble files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try and determine the most likely set of conformations for an ensemble file.

There are multiple methods for preparing data for the Ensemble Database that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein’s amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected.[12]

The molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (eg known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.[12]

Protein Data Type Protocol PED ID References
Sic1/Cdc4 NMR and SAXS Pool-based PED9AAA [13]
p15 PAF NMR and SAXS Pool-based PED6AAA [14]
MKK7 NMR Pool-based PED5AAB [15]
Beta-synuclein NMR MD-based PED1AAD [16]
P27 KID NMR MD-based PED2AAA [17]

(adapted from image in "Computational approaches for inferring the functions of intrinsically disordered proteins" [12])

References[edit]

  1. ^ "CRISPR/Cas9 Plasmids". www.systembio.com. Retrieved 2015-12-17.
  2. ^ "CRISPR Cas9 Genome Editing | OriGene". www.origene.com. Retrieved 2015-12-17.
  3. ^ a b c d Ran, F. Ann; Hsu, Patrick D.; Wright, Jason; Agarwala, Vineeta; Scott, David A.; Zhang, Feng (2013-11-01). "Genome engineering using the CRISPR-Cas9 system". Nature Protocols. 8 (11): 2281–2308. doi:10.1038/nprot.2013.143. ISSN 1754-2189. PMC 3969860. PMID 24157548.
  4. ^ Horvath, Philippe; Barrangou, Rodolphe (2010-01-08). "CRISPR/Cas, the immune system of bacteria and archaea". Science (New York, N.Y.). 327 (5962): 167–170. doi:10.1126/science.1179555. ISSN 1095-9203. PMID 20056882.
  5. ^ Bialk, Pawel; Rivera-Torres, Natalia; Strouse, Bryan; Kmiec, Eric B. (2015-06-08). "Regulation of Gene Editing Activity Directed by Single-Stranded Oligonucleotides and CRISPR/Cas9 Systems". PLoS ONE. 10 (6): e0129308. doi:10.1371/journal.pone.0129308. PMC 4459703. PMID 26053390.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  6. ^ Fischer, Bernd; Sandmann, Thomas; Horn, Thomas; Billmann, Maximilian; Chaudhary, Varun; Huber, Wolfgang; Boutros, Michael (2015-04-02). "A map of directional genetic interactions in a metazoan cell". eLife. 4: e05464. doi:10.7554/eLife.05464. ISSN 2050-084X. PMC 4384530. PMID 25748138.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  7. ^ a b Vinayagam, Arunachalam; Zirin, Jonathan; Roesel, Charles; Hu, Yanhui; Yilmazel, Bahar; Samsonova, Anastasia A.; Neumüller, Ralph A.; Mohr, Stephanie E.; Perrimon, Norbert (2014-01-01). "Integrating protein-protein interaction networks with phenotypes reveals signs of interactions". Nature Methods. 11 (1): 94–99. doi:10.1038/nmeth.2733. ISSN 1548-7091. PMC 3877743. PMID 24240319.
  8. ^ Chen, Ginny I.; Gingras, Anne-Claude. "Affinity-purification mass spectrometry (AP-MS) of serine/threonine phosphatases". Methods. 42 (3): 298–305. doi:10.1016/j.ymeth.2007.02.018.
  9. ^ Wang, Geqing; MacRaild, Christopher; Mohanty, Biswaranjan; Mobli, Mehdi; Cowieson, Nathan; Anders, Robin; Simpson, Jamie; McGowan, Sheena; Norton, Raymond; Scanlon, Martin (2014). "Molecular Insights into the Interaction between Plasmodium falciparum Apical Membrane Antigen 1 and an Invasion-Inhibitory Peptide". PLOS one. doi:10.1371/journal.pone.0109674. PMID 25343578. Retrieved 2015-12-18.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  10. ^ a b Clodfelter, Karl H.; Waxman, David J.; Vajda, Sandor (2006-08-08). "Computational solvent mapping reveals the importance of local conformational changes for broad substrate specificity in mammalian cytochromes P450". Biochemistry. 45 (31): 9393–9407. doi:10.1021/bi060343v. ISSN 0006-2960. PMID 16878974.
  11. ^ Mattos, Carla; Ringe, Dagmar (1996). "Locating and characterizing binding sites on proteins". Nature Biotechnology. doi:doi:10.1038/nbt0596-595. PMID 9630949. Retrieved 2015-12-18. {{cite journal}}: Check |doi= value (help)
  12. ^ a b c d Varadi, Mihaly; Vranken, Wim; Guharoy, Mainak; Tompa, Peter (2015-01-01). "Computational approaches for inferring the functions of intrinsically disordered proteins". Frontiers in Molecular Biosciences: 45. doi:10.3389/fmolb.2015.00045. PMC 4525029. PMID 26301226.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ Mittag, Tanja; Marsh, Joseph; Grishaev, Alexander; Orlicky, Stephen; Lin, Hong; Sicheri, Frank; Tyers, Mike; Forman-Kay, Julie D. (2010-03-14). "Structure/function implications in a dynamic complex of the intrinsically disordered Sic1 with the Cdc4 subunit of an SCF ubiquitin ligase". Structure (London, England: 1993). 18 (4): 494–506. doi:10.1016/j.str.2010.01.020. ISSN 1878-4186. PMC 2924144. PMID 20399186.
  14. ^ De Biasio, Alfredo; Ibáñez de Opakua, Alain; Cordeiro, Tiago N.; Villate, Maider; Merino, Nekane; Sibille, Nathalie; Lelli, Moreno; Diercks, Tammo; Bernadó, Pau (2014-02-18). "p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins". Biophysical Journal. 106 (4): 865–874. doi:10.1016/j.bpj.2013.12.046. ISSN 1542-0086. PMC 3944474. PMID 24559989.
  15. ^ Kragelj, Jaka; Palencia, Andrés; Nanao, Max H.; Maurin, Damien; Bouvignies, Guillaume; Blackledge, Martin; Jensen, Malene Ringkjøbing (2015-03-17). "Structure and dynamics of the MKK7-JNK signaling complex". Proceedings of the National Academy of Sciences of the United States of America. 112 (11): 3409–3414. doi:10.1073/pnas.1419528112. ISSN 1091-6490. PMC 4371970. PMID 25737554.
  16. ^ Allison, Jane R.; Rivers, Robert C.; Christodoulou, John C.; Vendruscolo, Michele; Dobson, Christopher M. (2014-11-25). "A relationship between the transient structure in the monomeric state and the aggregation propensities of α-synuclein and β-synuclein". Biochemistry. 53 (46): 7170–7183. doi:10.1021/bi5009326. ISSN 1520-4995. PMC 4245978. PMID 25389903.
  17. ^ Sivakolundu, Sivashankar G.; Bashford, Donald; Kriwacki, Richard W. (2005-11-11). "Disordered p27Kip1 exhibits intrinsic structure resembling the Cdk2/cyclin A-bound conformation". Journal of Molecular Biology. 353 (5): 1118–1128. doi:10.1016/j.jmb.2005.08.074. ISSN 0022-2836. PMID 16214166.