Nucleic Acid Arrays
Development and Applications of High Density RNA Arrays
Overview. We have recently developed a powerful new approach for the fabrication of high-density RNA arrays. In this approach, a high-density DNA array is fabricated by standard photolithographic methods using photoprotected nucleoside phosphoramidites, the surface-bound DNA molecules are enzymatically copied into their RNA complements (also tethered to the surface), and the DNA templates are enzymatically destroyed, leaving behind the desired RNA array. These arrays are RNA analogs to the high-density DNA arrays that have proven over the past two decades to be tremendously powerful tools for biomolecular analysis, particularly in the areas of global transcriptomics and genome-wide genetic variation analysis. We seek to further develop and optimize this novel technology for RNA array fabrication, and to develop and demonstrate its power and utility in important application areas.
Development and optimization of RNA array fabrication. The utility of RNA arrays depends strongly upon the array quality. Two key interrelated quality parameters are the RNA strand sequence fidelity (what fraction of RNA molecules in the array have the correct sequence?) and the RNA lengths that can be obtained (how long are the RNA molecules that can be made without excessive loss of sequence fidelity?). Other parameters of interest are the surface density of the RNA strands, the nature of the substrates employed for array fabrication, and the ability to employ modified nucleosides of various types. In this first aim, we will develop and apply the tools needed to measure these parameters, and use them to guide optimization (maximize sequence fidelity and length) of the chemistry and enzymology employed for array fabrication. We will also develop the ability to control strand density (which is important as inter-strand interactions on the surface can interfere with RNA strand folding and secondary structure), vary substrate types (important to be able to access different array synthesis chemistries and technologies for monitoring array binding), and employ various modified nucleosides (important for RNA and DNA aptamer projects as well as for therapeutic applications of nucleic acids).
Applications. The importance of RNA array technology will depend upon the extent to which RNA arrays open new possibilities in biomolecular analysis. Here are two interesting applications:
Identification and characterization of fluorescent RNA mimics of GFP. We will fabricate RNA arrays comprised of hundreds of thousands of variants of the recently reported fluorescent RNA mimics of GFP, and seek to identify variants with stronger chromophore binding affinities and improved fluorescent properties.
Determining the binding specificity of RNA-binding proteins. We will fabricate RNA arrays containing hundreds of thousands of permutations of the known binding sequences for the model RNA-binding protein PUM2, and use them to determine the rules governing its target binding specificity.
Significance. In 1991 Steve Fodor and his colleagues at Affymax ushered in the new field of biomolecular array analysis with the publication of their Science paper entitled “Light-Directed, Spatially Addressable Parallel Chemical Synthesis” . In this work the authors showed how the photolithographic methods developed for integrated circuit fabrication could be adapted for the light-directed synthesis of addressed arrays of biomolecules on planar substrates. Although their initial proof-of-principle work was limited in scope compared with present day standards (they fabricated a peptide array of 1024 elements and demonstrated the light-directed synthesis of a dinucleotide), it has been transformed over the ensuing years into a powerful analytical platform with widely available DNA arrays containing millions of oligonucleotide features and employed for myriad applications such as SBH (Sequencing by Hybridization) , Genome-wide Gene Expression Analysis , ChIP-chip (Chromatin ImmunoPrecipitation and analysis by DNA “chip”) , and CSI (cognate site identification) .
Remarkably, while DNA arrays became mainstream and ubiquitous tools of the modern molecular biologist, the seemingly straightforward extension of the concept to RNA arrays never became practical. The fabrication of DNA arrays relies upon the phosphoramidite chemistry developed in Marv Caruthers lab at CU Boulder [6, 7]. This chemistry is remarkable because of its extraordinary efficiency, which manifests itself in stepwise yields for monomer addition over 99% . This high efficiency is what allows DNA molecules as long as 150 nt in length  to be synthesized, and is the fundamental reason for the widespread availability of both individual high quality oligonucleotides and high-density DNA arrays. Although the same chemistry can be used for RNA synthesis, it does not give results of comparable quality. For RNA synthesis it is necessary to protect the 2’ hydroxyl to keep it from coupling during synthesis; in spite of much effort by many groups, it has not been possible to find a protecting group for this position that does not interfere with coupling at the adjacent 3’ hydroxyl, and/or give rise to undesired side reactions during deprotection . Because of these issues, there has not existed to date any viable technology for the fabrication of RNA arrays. (We note that it is possible to obtain fairly short individual RNAs, since they can be purified from even quite inefficient syntheses.)
Figure 1. Enzymatic fabrication of high-density RNA arrays.
We have recently made a breakthrough in this area, with a simple yet powerful new strategy for the enzymatic synthesis of a high-density RNA array ). The key idea is to use RNA polymerase to copy surface-attached DNA molecules on a high-density DNA array into their RNA complements (Figure 1). The surface is first partially deprotected (e.g. light is used to effect removal of 50% of the NPPOC photolabile protecting groups covering the surface), an array of the DNA complements to the eventual desired RNA sequences is synthesized by standard light-directed synthesis on the exposed sites, and the remaining surface sites are then deprotected, followed by synthesis of an RNA primer sequence. These primer sequences on the second group of sites then hybridize to their complements on the first group, whereupon they may be extended with T7 RNA polymerase to yield an RNA:DNA duplex. The DNA is removed with DNase I, leaving behind the desired single stranded RNAs.
A powerful aspect of this enzymatic approach to RNA array fabrication is the ease with which chemically modified nucleosides may be incorporated. For example, natural (unmodified) RNA molecules are resistant to DNase I but susceptible to RNase A, whereas 2’-fluoro RNA is resistant to both DNase I and RNase A. This resistance to nuclease digestion is very important for therapeutic applications of RNAs, and also extends the chemical diversity which can be obtained in combinatorial chemistry applications. Figure 3 shows the effects of RNase or DNase treatment upon either a natural RNA array or a 2'-fluoro RNA array. The natural RNA array is totally resistant to DNase but destroyed by RNase within 30 min, whereas neither DNase I nor RNase A has any effect upon the 2’-fluoro-RNA array. We note that the use of a very long, flexible, hydrophilic spacer (we employed three tandem PEG 2000 moieties) between the substrate and the oligonucleotides was critical to success of the approach, which is not surprising as it is necessary for the DNA complement and RNA primer sequences to anneal while both are still attached to the surface.
To illustrate some of the many potential applications of such arrays, we briefly outline several below.
Deciphering the binding specificities of RNA-binding proteins. A powerful strategy for determining the binding specificities of DNA-binding proteins such as transcription factors has been pioneered by our Wisconsin colleague Aseem Ansari . His group synthesized high-density arrays of double-stranded DNAs in which the binding sequence was systematically varied, and measured the relative binding intensities of the transcription factors to the different elements. This experiment reveals the binding specificity of the transcription factor, which is essential knowledge for unraveling its effects in vivo on gene regulation. High-density RNA arrays will allow similar experiments to be performed to look at the specificity of RNA-binding proteins, by synthesizing a combinatorial array of RNA targets with systematically varying sequences, and measuring the relative binding intensities for the RNA-binding proteins of interest. This information is essential to understanding the roles they play in critical biological processes such as alternative splicing and translational regulation .
Fabrication of RNA aptamer arrays to discover those with the best binding affinity and specificity. It is widely recognized that there is a pressing need for high affinity and high specificity ligands directed against all of the proteins in the human and other proteomes (http://commonfund.nih.gov/proteincapture/highlights.aspx). Aptamers are molecules selected from DNA or RNA libraries that fold into shapes whereby they behave as high affinity and high specificity ligands . The ability to fabricate high-density arrays of RNA aptamer candidates will allow them to be screened in parallel against protein targets to ascertain their binding affinity and specificity, simply by incubating the array with the target protein and measuring binding by either fluorescence [16, 17] or using a label-free binding assay such as surface plasmon resonance [18, 19].
Identifying RNA mimics of Green Fluorescent Protein (GFP). It is known that the fluorescent chromophore in GFP and related proteins is not fluorescent when separated from the protein; binding of the chromophore to the protein reduces its conformational flexibility and thereby shuts down other (non-fluorescence) pathways for relaxation from the excited state. This knowledge was leveraged in recent work by Jaffrey et al. [20, 21], who showed that it is possible to identify RNA molecules that similarly interact with the chromophores to render them fluorescent. This is exciting because, as shown by the authors, it is possible to engineer RNA molecules to include such fluorescence domains, thus allowing them to be visualized and tracked in vivo in a similar manner to how GFP fusion proteins are now ubiquitously employed to track proteins of interest. High-density RNA arrays will permit a dramatic simplification of the process for determining the correct RNA sequences, as one can simply fabricate an array of candidate RNA molecules, add the pre-fluorophore (a small organic compound similar in structure to the fluorophore in GFP), and observe which elements of the array produce a fluorescence signal.
Engineering sequence-specific RNA-binding proteins. Filipovska et al. showed recently that it is possible to engineer sequence-specific RNA-binding proteins based upon the sequence-recognition properties of the PUF proteins . RNA arrays provide a beautiful and efficient means of monitoring and optimizing the binding affinities and specificities of such engineered proteins, by measuring binding of the candidate proteins to RNA arrays. This is similar to the measurement of binding specificities of naturally occurring binding proteins as described above, but employed instead as a tool to assist in the engineering of specific binders.
RNA-based therapeutics. There continues to be much interest and activity in the area of RNA-based therapeutics . As with all drugs, binding activity is a sine qua non for biological activity. RNA arrays offer an elegant means to characterize the binding behaviour of large numbers of drug candidates, by simply monitoring binding of the target protein to the array.
Tiling arrays of RNA viral genomes. A powerful tool for studying the function of parts of RNA genomes will be to fabricate a tiling array that steps through it systematically (e.g. for a 100kb RNA genome, make an array of 100,000 hundredmers each starting one base further through the sequence), and use the array to look for binding interactions with viral or host proteins or other RNA components.
Arrays of miRNA. There are now over 1500 miRNAs known in human alone (www.miRBase.org), and this number will likely continue to grow as more are discovered. It is straightforward to synthesize arrays with all of these molecules and thereby create a tool for evaluating their binding interactions with other biomolecules. We note that the "miRNA arrays" sold by Affymetrix, for example, are actually DNA arrays, not RNA arrays, employed for measuring the binding of the miRNAs in biological samples by nucleic acid hybridization, and thus have a very different purpose or capability than actual arrays of miRNAs will have. The short length of miRNAs makes it very straightforward to fabricate them.
To save space, we will just mention some additional applications without explaining them fully. Engineering ribozyme arrays, discovering new ribozymes, studying ribozyme function, engineering artificial siRNAs and miRNAs, fabricating mRNA tiling arrays, and searching for miRNA "sponges" (molecules that bind to and inactivate miRNAs)  are other exciting applications of RNA arrays that may be considered. Clearly, although this vast compendium of possible applications is far too great to be developed fully in any single grant proposal, the point we seek to make for the reviewers is that the RNA array technology we seek here to develop has vast and far-reaching potential to accelerate research in many important emerging areas of biology. In the Approach section below we describe how we envision the further development, optimization, and implementation of the technology for selected critical applications.
RNA array fabrication. Our process for the photolithographic synthesis of DNA and RNA arrays employs a maskless array synthesizer (MAS) that was developed at UW-Madison by Professors Franco Cerrina, Michael Sussman, and Fred Blattner [25, 26]. This instrument uses photolithography and light-protected nucleotide building blocks to create surfaces covalently modified with hundreds of thousands of different oligonucleotide sequences/cm2. The key concept of the MAS is that the chromium masks used by Affymetrix for light-directed array fabrication are replaced with a digital micro-mirror device (DMD). The DMD is a 1.4cm x 1.1cm electronic "chip" containing an array of 1024 x 768 independently controlled mirrors. The MAS is a bench top machine that uses the DMD to produce 786,000 different 20-100nt long oligonucleotides on a single substrate. These sequences can be changed every 4 hours, as easily as one changes the text being fed to a laser printer [25, 26]. The instrument is a key enabling technology for the present work.
The process of photolithographic synthesis on the MAS begins with preparation of a hydroxyl-terminated surface. Most DNA array manufacturers employ glass substrates for this purpose, modified with a silane reagent [1, 25, 35, 36] to provide a surface hydroxyl group as the start point for DNA synthesis. Our group has pioneered the use of carbon substrates for DNA array fabrication, which have a number of advantages over glass, the most important of which is increased stability . The carbon substrate is overlaid with neat 9-decene-1-ol, and then illuminated with a low pressure Hg arc lamp to initiate a radical-mediated coupling of the alkene to the surface [32, 37]. This coupling reaction produces carbon-carbon bonds linking the hydroxyl functionality to the surface, a critical improvement compared to the hydrolytically unstable silyl ethers conventionally employed for glass functionalization. After rinsing, nucleotides with a photolabile protecting group are chemically coupled to the surface hydroxyl groups. Light is directed onto the surface by the DMD according to a programmed spatial pattern in order to photolithographically cleave the protecting group (NPPOC, 3'-nitrophenylpropyloxycarbonyl). For the RNA arrays, the surface hydroxyls are then reacted successively three times with a commercially available DMT-protected PEG-2000 phosphoramidite (Glen Research), with standard acid deprotection steps in between to remove the DMT group. (Note: DMT is an abbreviation for dimethoxytrityl, the standard acid-labile protecting group employed for conventional solid-phase DNA synthesis). This produces a hydrophilic spacer of approximately 45 x 3 = 135 ethylene oxide groups, with an approximate end-to-end length of around 300 angstroms. An NPPOC-protected nucleoside phosphoramidite is then coupled to the surface to give the first nucleotide (3’-terminal) of the sequences to be synthesized. The subsequent steps are depicted in Figure 1 and briefly described as follows. Partial deprotection (ca. 50%) with UV light is followed by coupling of DMT-rU to those exposed hydroxyls; then, the other 50% of the NPPOC protecting groups are removed to allow photolithographic synthesis of various DNA sequences on different spots of the substrate. These DNA strands are capped at their 5’ termini with acetyl groups. Then, RNA primers (12-base-long, same everywhere on the surface) are synthesized using acid-labile DMT protection chemistry onto the initial DMT-rU that was incorporated prior to the DNA strand synthesis. After base deprotection, these RNA primers are hybridized to the DNA strands, providing a start point from which T7 RNA polymerase can then copy the DNA strands into their RNA complements. The result is an array of RNA-DNA duplexes. Treatment with DNaseI to destroy the DNA molecules yields the desired array of single-stranded RNA molecules.
Strand density measurement and optimization. We and others have shown the importance of strand density (number of strands per unit area) on DNA arrays [38, 39]. If DNA strands are packed too closely together on the surface, they become less available for hybridization, but if too far apart, then hybridization rates and signal intensities will decrease. Such considerations are also important for RNA arrays. For example, protein binding to RNA arrays is likely to require sufficient distance between the RNA strands that the binding protein molecules do not impede one another. In order to control and optimize this variable, it is necessary to a) be able to vary the strand density on the surface, and b) be able to measure the effect of that strand density upon the measurement of interest (e.g. hybridization of complementary strands, binding of RNA-binding proteins, or aptamer folding). We have previously published methods  to control strand density by partial deprotection of the surface, followed by capping to permanently inactivate the exposed sites; strand synthesis after this procedure occurs on sites of reduced density on the surface, and thus the strands are synthesized at lower density. A necessary aspect of optimization is careful measurement of the resulting strand density. We have extensive experience with this sort of measurement, which we perform by hybridizing fluorophore-tagged oligonucleotides to the surface, followed by elution from the surface and measurement with a spectrofluorimeter . This measurement yields the density of hybridizable oligonucleotides on the surface. Measuring the density in this way of both the single-stranded DNA (prior to extending the RNA strand) and the single-stranded RNA (after DNAseI digestion to remove the DNA template) will allow us to evaluate the efficiency of the RNA fabrication process (primer duplex formation and polymerase extension). These measurements will enable optimization of numerous aspects of the RNA arrays including the substrate preparation, incorporation of the flexible PEG linker, and most importantly, the RNA strand surface density optimal for each of the three key applications discussed below.
Sequence fidelity and length. Recent work on gene assembly in our laboratory revealed a high degree of sequence fidelity (1.63 errors per kb) for the DNA molecules fabricated on our MAS arrays . It is thus possible, or even likely, that the RNA arrays already have sufficient sequence accuracy to accomplish the various applications mentioned in this proposal. Accordingly, we do not anticipate spending much time further optimizing the fidelity. However, a good measurement of sequence accuracy is required to inform our decisions and to aid in optimization experiments, if they become necessary. The sequence fidelity of DNA arrays on surfaces is usually estimated based upon the fluorescence signal obtained in hybridization experiments. Although expedient, hybridization is not the best means of evaluating sequence fidelity, as it is relatively tolerant of sequence mismatches. Therefore, we will employ the much more accurate method of Sanger sequencing, as follows. In order to cleave the synthetic RNAs from the surface we will use the commercial reagent Thiol-Modifier C6 S-S from Glen Research to incorporate an internal disulfide bond in the synthesis of the RNA primer, which may then be cleaved from the surface using dithiothreitol. If a problem is encountered with this cleavage strategy, there are several other approaches that can be employed for release of the RNA molecules. These include using a free RNA primer for transcription, rather than a surface-bound primer; using RNase H for directed cleavage; or reverse-transcribing the RNAs on the surface into DNA. The released RNAs will be reverse transcribed into DNAs using specific DNA primers flanking the regions to be analyzed, and the DNA copies will be cloned into plasmids for Sanger sequencing. We employed a similar strategy to evaluate sequence fidelity for work described in our recent Angewandte Chemie paper on RNA-mediated gene assembly . We will analyze the sequence fidelity of both the DNA grown on the MAS array and the RNA strands created from it. If fidelity or length analyses reveal inadequate RNA quality, then we will work to improve the RNA synthesis using different extension conditions (e.g. salt and manganese concentration) and by evaluating different commercially available RNA polymerases (e.g. T7, T3, and Durascribe (a T7 mutant available from Epicentre Technologies that has been developed for use with modified ribonucleotides)).
Key Applications. The utility of RNA array technology will depend on the information that it is able to provide. Some may feel that the era of high density array technology has passed, given the advent of next generation sequencing and its revolutionary impact on experimental design (e.g. the replacement of ChIP-chip by ChIP-Seq). The importance of RNA array technology thus lies in its ability to provide important information that is not more readily obtained in other ways. We describe below two important applications of RNA array technology that explore, respectively, the identification and characterization of fluorescent RNA mimics of GFP, and determination of the binding specificity of RNA-binding proteins.
Identification and Characterization of fluorescent RNA mimics of GFP. Fluorescent reagents continue to have a profound impact on cell and molecular biology with well over 2000 new articles a month documented in PUBMED. Scientist’s ability to track, localize, image and quantify molecules of interest in complex environments such as the cell cytoplasm or nucleus is steadily improving as the diversity of fluorescent reagents grows, concomitant with the increased availability of low-cost laser sources providing a wide variety of wavelengths, efficient optical filters and light-collection optics, and high-sensitivity detectors. One reagent in particular (Green Fluorescent Protein, GFP) has had such an amazing impact that its discoverers and developers, Osamu Shimomura, Martin Chalfie and Roger Y. Tsien, were awarded the 2008 Nobel Prize in Chemistry for its development. GFP along with its variously colored fluorescent cousins is remarkable because it is a naturally fluorescent protein molecule that can be expressed in a living system (e.g. a cell) either by itself or as a fusion protein with little or no cytotoxicity and brilliant fluorescence. The family of GFP-like proteins can be employed in concert to visualize many different molecules simultaneously.
A fascinating new set of RNA reagents with properties quite similar to GFP was recently described by the Jaffrey lab [20, 21]. These reagents, referred to as "RNA mimics of GFP," are RNA molecules rather than proteins. The fluorophore in GFP is formed from three adjacent residues Ser65, Tyr66, and Gly67, which undergo an autocatalytic intramolecular cyclization to yield 4-hydroxybenzlidene imidazolinone (HBI). This molecule, normally nonfluorescent, becomes fluorescent when it interacts with folded GFP, which holds it in a manner that disrupts its molecular motions, and thereby closes down excited state relaxation pathways other than fluorescence. The "RNA mimics" are RNA aptamers that were selected using the SELEX process [15, 49] for binding to this or other similar molecules, and by binding it, also convert it from a non-fluorescent species to a fluorescent species. The aptamer sequence can be inserted into a gene of interest to yield a "fusion" RNA comprising the natural RNA and the aptamer tag; addition of the normally non-fluorescent and cell-permeable chromophore to the cells causes the aptamer tag to fluoresce upon binding, allowing direct fluorescence visualization of the associated RNA species within the cell! As with GFP, these RNA mimics can be expressed within cells and variants have been developed to fluoresce in different colors.
Table 1. Photophysical and binding properties of RNA-fluorophore complexes
(adapted from Table S1 of Ref. 20).
Although a fascinating and important new concept, these "RNA mimics of GFP" are not yet perfect. In particular, they are not yet adequate for monitoring low abundance RNAs in vivo. The best RNA mimic reported in the Jaffrey paper is referred to there as 24-2, or “Spinach.” This aptamer complexes with the fluorophore DFHBI (3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI), commercially available from Lucerna Technologies), whose photophysical and binding properties are given in Table 1. Comparing the free and complexed form, the molar extinction coefficients differ by a factor of two, and the fluorescence quantum yield by a factor of 1000. Thus there will be a background fluorescence level of 1 part in 2000 from the uncomplexed fluorophore, relative to the complexed fluorophore. Accordingly, if the abundance of the RNA to be monitored is more than ~2000-fold lower than the concentration of free fluorophore employed, the RNA will be difficult to discern over background. The reported KD for “Spinach” is 537 nM, and accordingly relatively high concentrations of 10 uM free fluorophore are utilized for visualization experiments . Thus RNAs present at concentrations below ~5 nM will be near or below the fluorescence background level. In a hypothetical spherical cell 10 microns in diameter, 5 nM corresponds to 1500 copies per cell, a relatively high abundance RNA species. Hence the difficulty in monitoring less abundant species. As the photophysical properties of the DFHBI-Spinach complex are already quite excellent (reasonably high extinction coefficient and fluorescence quantum yield), the most obvious area for improvement is in the area of binding constant – if a thousand-fold increase in binding affinity could be obtained, this would allow a corresponding reduction in the concentration of fluorophore, and hence background level, needed for the assay.
Figure 4. Secondary structure predictions for aptamers 24-2 (Spinach) and 24-2-min by mFold, adapted from Figure S4C of Ref. 20.
Figure 4A shows the secondary structure predicted by mFold (http://mfold.rna.albany.edu)  for “Spinach”. Jaffrey and colleagues explored alterations to this sequence in their paper and used this information to come up with a minimal size variant called “24-2-min” which exhibits 95% of the fluorescence of the full-length sequence but is 18 nt shorter than 24-2 (Figure 4B). Interestingly, all of the changes that gave rise to increased fluorescence were located in one region of the structure, as indicated by the green positions in Figure 4A. We propose to extend the study of Jaffrey et al, exploring the effects of changes in the region identified in that study as being the one area of the structure in which changes led to increased fluorescence. The 786,000 features that can be synthesized in parallel on the MAS will allow us to make all combinatorial variants at 9 base positions (49 = 260,000). We will make RNA arrays containing all possible variants of 9 bases on one side of the hairpin, or on the opposite side of the hairpin, and either varying the opposing strand to maintain base complementarity, or not (4 designs in total). Thus in four experiments we will sample over a million sequence variants in the critical region of the structure for their effect upon fluorescence yield. Each of these will also be evaluated for the effect upon fluorophore association constant. To do this the RNA arrays will be exposed to the fluorophore at various concentrations and imaged by a fluorescence scanner. This approach will permit measurement of both the equilibrium binding strength [16, 17] as well as a comparative measure of the fluorescence produced by each RNA (relative to the other RNAs on the surface). Arrays will be imaged using our GENETAC UC4X4 4-color laser scanner, which we have used for high-resolution fluorescence imaging of biomolecule arrays for many years.
We will perform similar experiments to explore the possibility of finding tighter-binding variants of 24-2 min; again we will follow the pioneering experiments of Jaffrey et al, who showed that the sequences within the short hairpins are critical to binding (red bases in Figure 4A) – happily, the short lengths of these hairpins are within the range that can be explored with the achievable MAS combinatorial complexity. We will make all possible variants of 9 positions for each hairpin, and measure fluorescence intensity and binding coefficients by fluorescence imaging as described above. In this way we will find out, at a minimum, if the 24-2 min sequence is the optimum sequence within that combinatorial space; and if we are fortunate, we may find better sequences that confer advantages in sensitivity for in vivo imaging experiments, and thus provide important advantages for the in vivo imaging of specific RNA sequences. More generally, this important and interesting system will serve as a testbed for the development of RNA arrays as a platform for the discovery, characterization, and optimization of RNA ligands for small molecules.
Determining the binding specificity of RNA-binding proteins. Understanding gene expression has been a goal for researchers since it was discovered that DNA is the genetic material responsible for the production and heritability of discrete biological traits. It has become evident that a major stage in the regulation of gene expression is the control of protein translation through protein-RNA interactions. Post-transcriptional regulation involves many intertwined layers of mRNA processing such as polyadenylation, splicing, cellular translocation and ultimately degradation. The timing and location of each of these processes is controlled by interactions with specific proteins and/or protein complexes called RNA binding proteins (RBPs). The elucidation of these interactions, in particular the binding specificities of RNA binding proteins, is critical to understanding the mechanisms by which gene expression is regulated.
High-density RNA arrays provide a flexible and comprehensive tool to identify and study protein binding sites. We will evaluate the effectiveness of using an RNA array to determine all possible binding sequences of a commercially available prototype RNA-binding protein, PUM2. PUM2 is an extensively studied RNA binding protein in the PUF family of proteins. The significance of the PUF protein family has been established by their evolutionarily conserved binding domain (Pumilio-homology domain)  and their known involvement in stem cell maintenance and self renewal  and neuronal function . PUF proteins generally bind the 3’-UTR of their target mRNA transcripts, at positions where the mRNA is single stranded. Furthermore, their binding does not appear to depend upon the recognition of RNA secondary structure, which will be conducive to the use of the RNA array that presents relatively short 40nt sequences capable of forming limited secondary structure 
2. Drmanac, R.; Drmanac, S.; Chui, G.; Diaz, R.; Hou, A.; Jin, H.; Jin, P.; Kwon, S.; Lacy, S.; Moeur, B.; Shafto, J.; Swanson, D.; Ukrainczyk, T.; Xu, C.; and Little, D., "Sequencing by hybridization (SBH): advantages, achievements, and opportunities." Advances in biochemical engineering/biotechnology, 2002, 77, 75-101.
3. Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; and Mesirov, J.P., "Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles." Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43), 15545-15550.
4. Ren, B.; Robert, F.; Wyrick, J.J.; Aparicio, O.; Jennings, E.G.; Simon, I.; Zeitlinger, J.; Schreiber, J.; Hannett, N.; Kanin, E.; Volkert, T.L.; Wilson, C.J.; Bell, S.P.; and Young, R.A., "Genome-Wide Location and Function of DNA Binding Proteins." Science, 2000, 290(5500), 2306-2309.
5. Warren, C.L.; Kratochvil, N.C.S.; Hauschild, K.E.; Foister, S.; Brezinski, M.L.; Dervan, P.B.; Phillips, G.N.; and Ansari, A.Z., "Defining the sequence-recognition profile of DNA-binding molecules." Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(4), 867-872.
9. LeProust, E.M.; Peck, B.J.; Spirin, K.; McCuen, H.B.; Moore, B.; Namsaraev, E.; and Caruthers, M.H., "Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process." Nucleic Acids Research, 2010, 38(8), 2522-2540.
12. Wu, C.H.; Chen, S.; Shortreed, M.R.; Kreitinger, G.M.; Yuan, Y.; Frey, B.L.; Zhang, Y.; Mirza, S.; Cirillo, L.A.; Olivier, M.; and Smith, L.M., "Sequence-specific capture of protein-DNA complexes for mass spectrometric protein identification." PLoS One, 2011, 6(10), e26217. PMC3197616.
17. Collett, J.R.; Cho, E.J.; Lee, J.F.; Levy, M.; Hood, A.J.; Wan, C.; and Ellington, A.D., "Functional RNA microarrays for high-throughput screening of antiprotein aptamers." Anal Biochem, 2005, 338(1), 113-123.
18. Li, Y.; Lee, H.J.; and Corn, R.M., "Fabrication and characterization of RNA aptamer microarrays for the study of protein-aptamer interactions with SPR imaging." Nucleic Acids Research, 2006, 34(22), 6416-6424.
19. Lockett, M.R.; Weibel, S.C.; Phillips, M.F.; Shortreed, M.R.; Sun, B.; Corn, R.M.; Hamers, R.J.; Cerrina, F.; and Smith, L.M., "Carbon-on-Metal Films for Surface Plasmon Resonance Detection of DNA Arrays." Journal of the American Chemical Society, 2008, 130(27), 8611-8613. PMC2527731.
25. Singh-Gasson, S.; Green, R.D.; Yue, Y.; Nelson, C.; Blattner, F.; Sussman, M.R.; and Cerrina, F., "Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array." Nat Biotechnol, 1999, 17(10), 974-8.
26. Nuwaysir, E.F.; Huang, W.; Albert, T.J.; Singh, J.; Nuwaysir, K.; Pitas, A.; Richmond, T.; Gorski, T.; Berg, J.P.; Ballin, J.; McCormick, M.; Norton, J.; Pollock, T.; Sumwalt, T.; Butcher, L.; Porter, D.; Molla, M.; Hall, C.; Blattner, F.; Sussman, M.R.; Wallace, R.L.; Cerrina, F.; and Green, R.D., "Gene expression analysis using oligonucleotide arrays produced by maskless photolithography." Genome Res, 2002, 12(11), 1749-1755.
27. Lockett, M.R.; Carlisle, J.C.; Le, D.V.; and Smith, L.M., "Acyl Chloride-Modified Amorphous Carbon Substrates for the Attachment of Alcohol-, Thiol-, and Amine-Containing Molecules." Langmuir, 2009, 25(9), 5120-5126. PMC2824164.
32. Phillips, M.F.; Lockett, M.R.; Rodesch, M.J.; Shortreed, M.R.; Cerrina, F.; and Smith, L.M., "In situ oligonucleotide synthesis on carbon materials: stable substrates for microarray fabrication." Nucleic Acids Research, 2008, 36(1), e7. PMC2248760.
35. Pease, A.C.; Solas, D.; Sullivan, E.J.; Cronin, M.T.; Holmes, C.P.; and Fodor, S.P.A., "Light-Generated Oligonucleotide Arrays for Rapid DNA-Sequence Analysis." Proceedings of the National Academy of Sciences of the United States of America, 1994, 91(11), 5022-5026.
36. McGall, G.H.; Barone, A.D.; Diggelmann, M.; Fodor, S.P.A.; Gentalen, E.; and Ngo, N., "The efficiency of light-directed synthesis of DNA arrays on glass substrates." Journal of the American Chemical Society, 1997, 119(22), 5081-5090.
37. Sun, B.; Colavita, P.E.; Kim, H.; Lockett, M.; Marcus, M.S.; Smith, L.M.; and Hamers, R.J., "Covalent photochemical functionalization of amorphous carbon thin films for integrated real-time biosensing." Langmuir, 2006, 22(23), 9598-9605.
38. Guo, Z.; Guilfoyle, R.A.; Thiel, A.J.; Wang, R.; and Smith, L.M., "Direct fluorescence analysis of genetic polymorphisms by hybridization with oligonucleotide arrays on glass supports." Nucleic Acids Research, 1994, 22(24), 5456-5465.
43. Hafner, M.; Landthaler, M.; Burger, L.; Khorshid, M.; Hausser, J.; Berninger, P.; Rothballer, A.; Ascano, M., Jr.; Jungkamp, A.-C.; Munschauer, M.; Ulrich, A.; Wardle, G.S.; Dewell, S.; Zavolan, M.; and Tuschl, T., "Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP." Cell, 2010, 141(1), 129-141.
44. Hafner, M.; Landthaler, M.; Burger, L.; Khorshid, M.; Hausser, J.; Berninger, P.; Rothballer, A.; Ascano, M.; Jungkamp, A.-C.; Munschauer, M.; Ulrich, A.; Wardle, G.S.; Dewell, S.; Zavolan, M.; and Tuschl, T., "PAR-CliP - A Method to Identify Transcriptome-wide the Binding Sites of RNA Binding Proteins." J Vis Exp, 2010, (41), e2034.
45. Castello, A.; Fischer, B.; Eichelbaum, K.; Horos, R.; Beckmann, B.M.; Strein, C.; Davey, N.E.; Humphreys, D.T.; Preiss, T.; Steinmetz, L.M.; Krijgsveld, J.; and Hentze, M.W., "Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins." Cell, 2012, 149(6), 1393-1406.
47. Aurup, H.; Williams, D.M.; and Eckstein, F., "2'-Fluoro-2'-Deoxynucleoside and 2'-Amino-2'-Deoxynucleoside 5'-Triphosphates as Substrates for T7 Rna-Polymerase." Biochemistry-Us, 1992, 31(40), 9636-9641.
48. Campbell, Zachary T.; Bhimsaria, D.; Valley, Cary T.; Rodriguez-Martinez, Jose A.; Menichelli, E.; Williamson, James R.; Ansari, A.Z.; and Wickens, M., "Cooperativity in RNA-Protein Interactions: Global Analysis of RNA Binding Specificity." Cell Reports, 2012, 1(5), 570-581.
52. Jiang, Q.H.; Wang, Y.D.; Hao, Y.Y.; Juan, L.R.; Teng, M.X.; Zhang, X.J.; Li, M.M.; Wang, G.H.; and Liu, Y.L., "miR2Disease: a manually curated database for microRNA deregulation in human disease." Nucleic Acids Research, 2009, 37, D98-D104.
57. Lal, A.; Navarro, F.; Maher, C.A.; Maliszewski, L.E.; Yan, N.; O'Day, E.; Chowdhury, D.; Dykxhoorn, D.M.; Tsai, P.; Hofmann, O.; Becker, K.G.; Gorospe, M.; Hide, W.; and Lieberman, J., "miR-24 Inhibits cell proliferation by targeting E2F2, MYC, and other cell-cycle genes via binding to "seedless" 3'UTR microRNA recognition elements." Mol Cell, 2009, 35(5), 610-25. 2757794.
59. Shin, C.; Nam, J.W.; Farh, K.K.H.; Chiang, H.R.; Shkumatava, A.; and Bartel, D.P., "Expanding the MicroRNA Targeting Code: Functional Sites with Centered Pairing." Molecular Cell, 2010, 38(6), 789-802.
63. Wang, B.; Li, S.; Qi, H.H.; Chowdhury, D.; Shi, Y.; and Novina, C.D., "Distinct passenger strand and mRNA cleavage activities of human Argonaute proteins." Nature structural & molecular biology, 2009, 16(12), 1259-66.
64. MacRae, I.J.; Ma, E.; Zhou, M.; Robinson, C.V.; and Doudna, J.A., "In vitro reconstitution of the human RISC-loading complex." Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(2), 512-517.
65. Rivas, F.V.; Tolia, N.H.; Song, J.J.; Aragon, J.P.; Liu, J.D.; Hannon, G.J.; and Joshua-Tor, L., "Purified Argonaute2 and an siRNA form recombinant human RISC." Nature structural & molecular biology, 2005, 12(4), 340-349.
67. Vergoulis, T.; Vlachos, I.S.; Alexiou, P.; Georgakilas, G.; Maragkakis, M.; Reczko, M.; Gerangelos, S.; Koziris, N.; Dalamagas, T.; and Hatzigeorgiou, A.G., "TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support." Nucleic Acids Research, 2012, 40(D1), D222-D229.
68. Maragkakis, M.; Reczko, M.; Simossis, V.A.; Alexiou, P.; Papadopoulos, G.L.; Dalamagas, T.; Giannopoulos, G.; Goumas, G.; Koukis, E.; Kourtis, K.; Vergoulis, T.; Koziris, N.; Sellis, T.; Tsanakas, P.; and Hatzigeorgiou, A.G., "DIANA-microT web server: elucidating microRNA functions through target prediction." Nucleic Acids Research, 2009, 37, W273-W276.
71. Wang, Z.; Tollervey, J.; Briese, M.; Turner, D.; and Ule, J., "CLIP: Construction of cDNA libraries for high-throughput sequencing from RNAs cross-linked to proteins in vivo." Methods, 2009, 48(3), 287-293.
72. Konig, J.; Zarnack, K.; Rot, G.; Curk, T.; Kayikci, M.; Zupan, B.; Turner, D.J.; Luscombe, N.M.; and Ule, J., "iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution." Nature Structural & Molecular Biology, 2010, 17(7), 909-U166.
73. Galgano, A.; Forrer, M.; Jaskiewicz, L.; Kanitz, A.; Zavolan, M.; and Gerber, A.P., "Comparative Analysis of mRNA Targets for Human PUF Family Proteins Suggests Extensive Interaction with the miRNA Regulatory System." PLoS One, 2008, 3(9).
74. Lee, M.H.; Hook, B.; Pan, G.; Kershner, A.M.; Merritt, C.; Seydoux, G.; Thomson, J.A.; Wickens, M.; and Kimble, J., "Conserved regulation of MAP kinase expression by PUF RNA-binding proteins." Plos Genet, 2007, 3(12), 2540-2550.
76. Qiu, C.; Kershner, A.; Wang, Y.M.; Holley, C.P.; Wilinski, D.; Keles, S.; Kimble, J.; Wickens, M.; and Hall, T.M.T., "Divergence of Pumilio/fem-3 mRNA Binding Factor (PUF) Protein Specificity through Variations in an RNA-binding Pocket." J Biol Chem, 2012, 287(9), 6949-6957.