Вы находитесь на странице: 1из 7

BIOINFORMATICS

Vol. 19 no. 7 2003, pages 796802 DOI: 10.1093/bioinformatics/btg086

Selection of oligonucleotide probes for protein coding sequences


Xiaowei Wang and Brian Seed
Department of Molecular Biology, 50 Blossom Street, Massachusetts General Hospital, Boston, MA 02114, USA
Received on July 31, 2002; revised on November 13, 2002; accepted on December 14, 2002

ABSTRACT Motivation: Large arrays of oligonucleotide probes have become popular tools for analyzing RNA expression. However to date most oligo collections contain poorly validated sequences or are biased toward untranslated regions (UTRs). Here we present a strategy for picking oligos for microarrays that focus on a design universe consisting exclusively of protein coding regions. We describe the constraints in oligo design that are imposed by this strategy, as well as a software tool that allows the strategy to be applied broadly. Result: In this work we sequentially apply a variety of simple lters to candidate sequences for oligo probes. The primary lter is a rejection of probes that contain contiguous identity with any other sequence in the sample universe that exceeds a pre-established threshold length. We nd that rejection of oligos that contain 15 bases of perfect match with other sequences in the design universe is a feasible strategy for oligo selection for probe arrays designed to interrogate mammalian RNA populations. Filters to remove sequences with low complexity and predicted poor probe accessibility narrow the candidate probe space only slightly. Rejection based on global sequence alignment is performed as a secondary, rather than primary, test, leading to an algorithm that is computationally efcient. Splice isoforms pose unique challenges and we nd that isoform prevalence will for the most part have to be determined by analysis of the patterns of hybridization of partially redundant oligonucleotides. Availability: The oligo design program OligoPicker and its source code are freely available at our website. Contact: seed@molbio.mgh.harvard.edu Supplementary information: http://pga.mgh.harvard. edu/oligopicker/index.html

analyze a large number of genes simultaneously (reviewed by Gerhold et al., 1999). The two most widely used types of non-commercial microarrays are spotted cDNA and spotted oligonucleotide arrays. One of the challenges posed by oligo array technology is the design of probes that provide the maximum amount of biologically relevant information. To date little attention has been paid to choosing oligos that provide information that has the highest biological relevance. Most probe collections focus on 3 UTR, in part because of a presumption that oligo dT will be used to prime the RNA populations, and in part because sequence divergence is typically greater in such regions. However 3 UTR variability is substantial, and the tools for predicting the prevalence of alternative 3 ends are still in development. The 5 ends of transcription units that lack TATA elements are also heterogeneous, and quantitative estimates of the distributions of ends are rare in the literature. Ultimately, most users of microarray data are interested in the abundance of the encoded proteins, for which the prevalence of the cognate RNAs remains the most accessible surrogate. In this paper, we describe the results of studies on the impact of various design criteria on the fraction of candidate probes qualied for microarray interrogation of RNA coding regions. OligoPicker, our design tool, evaluates specicity primarily for the occurrences of contiguous perfect matches. Sequence specicity is also crosschecked by global BLAST score. The hybridization accessibility of both the probes and the target analyte is assessed by calculation of regions that may self-anneal. We have also developed an algorithm to cluster redundant mouse and human genes and have used the resulting unique sequences for probe design.

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

INTRODUCTION Microarray technology has been widely used to monitor gene expression in recent years. Hybridization to such arrays provides a fast and moderately reliable approach to
To whom correspondence should be addressed.

ALGORITHM Oligo picking criteria Location in the sequence. Two kinds of priming reactions are commonly used to create cDNA from mRNA or total RNA for microarray experiments: random priming
Bioinformatics 19(7) c Oxford University Press 2003; all rights reserved.

796

Oligo probe selection

and oligo dT priming. Random priming is expected to result in the creation of probes that have signicant representation of non-coding RNAs, such as ribosomal, or small nuclear RNAs. Oligo dT priming is expected to result in probes that are enriched for mRNAs, but that will not necessarily encompass the entire coding region. Because we are interested in understanding the contributions of different coding region isoforms, we expect to be performing random primed labeling. Since the primer can anneal at any place in the template, if each primer is capable of being extended to the 5 end of the RNA, there will result a linear gradient in the representation of the sequences, ranging from highest at the 5 end to lowest at the 3 end. In the limit of very large transcript length, premature termination of reverse transcription will have the effect of attening the gradient toward the 5 end; but in general the desired oligos for maximum sensitivity in a random primed labeling will lie as close to the 5 end of the RNA as possible. Tm uniformity. To insure quantitative comparison of gene expression, microarray hybridization conditions should be similar for all genes in the study. Although tetra-alkyl ammonium salts in appropriate concentrations are known to eliminate the dependence of melting temperature on base composition (Jacobs et al., 1988) and have been applied to the use of degenerate oligonucleotide hybridization probes (Wood et al., 1985), they have not yet been widely applied to microarray hybridizations. Hence one design requirement for the oligo probes was that their melting temperatures (Tm ) should fall in a narrow range. Since the GC content varies in different organisms, the oligo design tool rst evaluates all sequences in the data set and determines the median Tm for oligos by the following formula: 64.9 + 41 gcCount/oligoLength 600/oligoLength where gcCount is the number of all Gs and Cs in an oligo and the molar sodium concentration is taken to be 0.1 M (Schildkraut, 1965). OligoPicker starts picking oligos from one end of the sequence. A probe candidate is discarded if its Tm is not within 5 C of the median Tm . Probe accessibility. The contribution of probe secondary structure to oligo hybridization efciency has not been studied in great detail to date, but may impact performance because hybridizations are performed at below 50 C in many microarray protocols (Hughes et al., 2001; Kane et al., 2000). Although the prediction of nucleic acid secondary or tertiary structure remains quite challenging, a priori, the likelihood of secondary structure is greatest in regions of signicant self-complementarity, for example in the stems of stem-loop structures (reviewed in Mount, 2001). Probe candidates are therefore tested for homology to the complementary strand of their cognate

sequences using BLAST. This approach does not take local concentration of the complementary sequence into account, for example the expected higher free energy (due to ring closure entropy) of structures that entail the formation of large loops. Reduced cross-hybridization. Although the hybridization free energy cannot be calculated directly at present (Li and Stormo, 2001), and empirical data suggest that mismatch context can signicantly affect DNA duplex stability (Kierzek et al., 1999), a single mismatch can be expected to destabilize the hybridization complex (Willems et al., 1989). Because we hypothesize that contiguous base pairing is the single most important determinant of cross-hybridization, we have made the rejection of contiguous sequence identity the primary lter in our selection scheme. To quickly search for a stretch of perfectly matched bases, we store all possible 10mers within the data set in a hash table data structure. The hash key is a 10mer sequence and the hash value is a string representation of the relative sequence indexes and positions where this particular 10mer is found (Fig. 1A). 10mers were adopted after considering both program performance and computer memory consumption. Repetitive n-mers are identied by using two overlapping 10mers in the hash (Fig. 1B). Therefore the design tool is able to nd repetitive sequence stretches from 10mers to 20mers very quickly. In an earlier study it was reported that very high sequence similarity may lead to cross-hybridization even when the sequences have been prescreened for contiguous perfect match (Kane et al., 2000). To reduce the contribution to cross-hybridization attributable to global similarity, oligos are also screened on the basis of their NCBI BLAST scores (http://www.ncbi.nlm.nih.gov/BLAST) against the design universe. Because of high sequence homology, some sequences will inevitably fail to be represented by unique oligo probes. The sequence homology is most likely from alternative splicing or from other distinct genes in gene super-families. In these cases, oligo probes are picked from the regions that cross-hybridize, determined by both continuous base match and BLAST score, to the smallest number of other sequences in the sample universe. Evasion of non-coding RNA and low complexity regions. One practical concern is that RNA other than mRNA may interfere with array hybridization when total RNA is used as the starting material. These RNAs may also be reversetranscribed and cross-hybridize to some oligo probes. To address this issue, sequence regions similar to rRNA or snRNA are avoided during the probe selection process by using both contiguous base match screening and BLAST. Low-complexity regions are also likely to contribute to
797

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

X.Wang and B.Seed

A
Gene Index = 1000 gtcattgatgaagcgcattgtgtggtcagtggggtcatgattttcgtcaaaaaagtagatttgg

10-mer

gtcattgatg (1000,1),(343, 22),(4442, 201),(4599, 890),(18949,1900), tcattgatga (1000,2),(10225, 455),(14567, 890),(20021,12), cattgatgaa (1000,3),(23,444),(2265, 211),(7895, 2110), attgatgaag (1000,4),(6679,3451), ttgatgaagc (1000,5),(7865, 67), tgatgaagcg (1000,6),(4599,895),(9899,22),

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

B
gtcattgatgaagcg (15-mer) gtcattgatg (1000,1),(343, 22),(4442, 201),(4599, 890),(18949,1900), (1000,6),(4599,895),(9899,22),

tgatgaagcg

Fig. 1. An example to illustrate how a repetitive 15mer is identied. All possible 10mers within the data set were stored in a hash data structure. A hash key is a 10mer sequence and the hash value is a string representation of the relative sequence indexes and positions where this particular 10mer is found. Repetitive 15mers were identied by using two overlapping 10mers in the hash. (A) The relative gene index of this sample sequence is 1000. All 10mers were stored in the existing hash by gliding through the sequence one base at a time. Each pair of parentheses (gene index, position) represents where this 10mer is found. (B) The 15mer gtcattgatgaagcg from position 1 in gene index 1000 is used here to illustrate how repetitive 15mers are identied by using the 10mer hash data structure. This 15mer also occurred at position 890 in sequence index 4599 (highlighted in bold).

cross-hybridization (Wootton and Federhen, 1996). These regions are identied by the DUST program (Hancock and Armstrong, 1994) and skipped when selecting probes.

Sequence le preparation Sequence redundancy is one of the biggest challenges for probe design. The most highly documented source of candidate genes, the NCBI Reference Sequence Project, RefSeq (Pruitt et al., 2000; Pruitt and Maglott, 2001) provides functional annotations and avoids sequence redundancy, but is still relatively incomplete. The Institute for Genomic Research (TIGR) has produced an orthologous gene list for human, mouse, and rat (http://www.tigr.org) that we have also used to help identify gene candidates. However the principle source of our sequence information has been the NCBI protein database GenPept. The corresponding DNA coding sequences were retrieved based on the CDS feature instructions (http: //www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html) and redundant sequences were clustered (data sets available online, see Supplementary Information). In our clustering
798

algorithm and its implementation (a program called DeRedund), we gave higher priority to RefSeq sequences (excluding RefSeq genes predicted from genomic contig assembly process), orthologous sequences and longer coding sequences. Each DNA coding sequence was aligned against all other coding sequences in the data set using BLAST. Sequences with at least 96% identity to the query sequence and with an alignment equal to or greater than the query length5 bases were considered redundant. Varying the identity threshold from 90 to 98% had little effect on the nal number of clusters (<5%). Ninety-six percent was chosen after considering sequencing errors and polymorphism density. Cluster representatives were fed to the oligo design program for probe design.

RESULTS There were close to 60 000 mouse sequences in NCBI protein database as of 1st March 2002. About 1/4 of them are immunoglobulin or TCR sequences, most of

Oligo probe selection

Additional Failed Sequences

which were removed as the initial step to reduce sequence redundancy. Redundancies were then removed to generate 20 030 gene clusters. The resulting sequences were organized in a FASTA le and used as the input data for OligoPicker to design 70mer probes. All oligo probes have predicted melting temperatures falling in a 10 C Tm range (79 C 5 C).

12000 10000 8000 6000 4000 2000 0 >= 20-mer 19-mer 18-mer 17-mer 16-mer 15-mer 14-mer

mouse sequence random sequence

Impact of threshold for rejection of contiguous matches We empirically explored the fraction of oligonucleotides containing contiguous identity of length n or greater with at least one other sequence in the design universe (Fig. 2). For this investigation we used the mouse GenPept data set, and assessed picking failure, dened by the inability to choose at least one oligo for a given gene, as a function of the perfect match length n . Figure 2 shows the fraction of genes failing the test as n is incrementally shortened from an initial value of 20. The latter value was chosen because there was very little change over the range 70 to 20 (data not shown), indicating that the sequences that fail the 20mer perfect match threshold correspond to very closely related sequences in the database. In the remaining universe of sequences that pass the 20mer match criterion, an increasing fraction fail as the contiguous match length is decreased. In Figure 2 we also display the expected number of picking failures when each of the mouse sequences is replaced by random strings of nucleotides of the same length and base composition. The difference in behavior between random sequences and sequences from the mouse data set can be taken as a measure of the higher order non-random character of the murine coding regions due to biological selection. Rejection of 15mer match was chosen as the default for the oligo selection tool since 14mer match is not likely to signicantly contribute to cross-hybridization under most array hybridization conditions. For the non-coding RNA lter, a more stringent criterion, rejection of 13mer match, was chosen because of the abundance of these RNAs in the sample. Impact of oligo probe length To evaluate the effect of oligo length on selection failure rate, the oligo length was varied over the range 30100 nucleotides. Figure 3 shows that the number of probes that can be designed decreases as the probe length increases. 70mer oligo probes have been widely adopted in the microarray community, so this length was chosen for our probe sets. However, the users have the option to design probes from 20 bases to 100 bases long (see Supplementary Information). BLAST and self-annealing lters After rejecting candidate oligonucleotides on the basis of contiguous match, BLAST and self-annealing lters

Perfect-Match Length for Rejection

Fig. 2. Impact of threshold for rejection of contiguous matches. Each probe is a 70mer oligonucleotide representing the cognate DNA sequence. All probes are within a 10 C Tm range (79 5 C) of each other. Probe candidates containing contiguous identity of length n or greater with at least one other sequence in the design universe were rejected. Sequence failure, dened by the inability to choose at least one oligo probe for a given gene, was assessed as a function of the perfect match length for rejection. The fraction of genes failing the test is shown here as the length of perfect matches is incrementally shortened from an initial value of 20. Also shown is the expected number of sequence failures when each of the mouse sequences is replaced by random strings of nucleotides of the same length and base composition.
20000 18000

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

Unique Probes

16000 14000 12000 10000 8000 6000 30 40 50 60 70 80 90 100

Probe Length
Fig. 3. Impact of oligo probe length on probe selection. The oligo length was varied over the range 30100 nucleotides for each probe set. All probes are within a 10 C Tm range (79 5 C) of each other and the selected probes do not have any repetitive 15mer.

were applied. Oligos were designated as eligible if they showed a BLAST score lower than a threshold value when compared to all sequences in the sample universe (Fig. 4). The BLAST parameters are -F F -S 1 -e 1000. To evaluate the effect of BLAST score on oligo picking,
799

X.Wang and B.Seed

14000 12000

14000 13000

Unique Probes

10000 8000 6000 4000 2000 0 25 30 35 40 45 50 55

Unique Probes

12000 11000 10000 9000 8000 8 9 10 11 12

Threshold BLAST Score


Fig. 4. Impact of BLAST score lter on probe design. Each probe is a 70mer oligonucleotide representing the cognate DNA sequence. All probes are within a 10 C Tm range (79 5 C) of each other and the selected probes do not have any repetitive 15mer. Oligos were designated as unique if they showed a BLAST score lower than a threshold value when compared to all sequences in the sample universe. To evaluate the effect of BLAST score on oligo picking, a score range 2850 was examined.

Maximum Paired Bases


Fig. 5. Impact of probe self-annealing lter on probe design. Each probe is a 70mer oligonucleotide representing the cognate DNA sequence. All probes are within a 10 C Tm range (79 5 C) of each other and the selected probes do not have any repetitive 15mer. Probes were selected if no more than a threshold number (812) of perfect matches were found with the complement strand of the cognate mRNA.

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

a score range 2850 was examined. Only a small number of probes could be designed when the score was lowered to 28. A BLAST score of 30 roughly corresponds to 15 contiguously matched bases. As discussed in the contiguous match lter section, 14mer match is acceptable and so is BLAST score <30. Therefore, score 30 was selected for the BLAST lter. The self-annealing lter was examined in a similar way. Probe candidate sequences were aligned against the complementary strands of the target analyte using BLAST. The BLAST parameters are -S 2 -F F -W 8 -e 1000. The longer the perfect-match regions on the cDNA, the more likely they are to base pair and become inaccessible to probes during hybridization. Probes were selected if no more than a threshold number (812) of perfect matches were found with the complementary strand of the mRNA. As shown in Figure 5, very similar numbers of probes could be designed when the possible base-paired region was longer than eight. Therefore, nine maximum paired-bases were chosen for the cDNA self-annealing lter. Similarly for oligo self-annealing, the oligo was compared to its self-complementary sequence to search for perfect matches. A more stringent criterion was used up to eight paired-bases were allowed to accommodate the expectation that self-pairing of the oligo would be more detrimental to hybridization than self-pairing to distant sequence in the mRNA.

low-complexity lter. (c) Condition b + RNA lter. (d ) Condition c + self-annealing lter. (e) Condition d + BLAST lter (Fig. 6). As expected, the number of sequences that failed to generate unique probes increased as the screening stringency increased. The rejection of repetitive 15mers was the most effective lter with the parameters we set, resulting in 87% of all the picking failures (Fig. 6). The low-complexity lter, RNA lter, self-annealing (accessibility) lter, and BLAST lter only slightly affected the probe design. For the sequences that did not have unique probes under these conditions, the algorithm is designed to accept repetitive 15mers (reject repetitive 16mers) and under these conditions 2771 additional unique oligo probes could be designed.

Impact of combining all screening lters Probes were selected under ve different conditions: (a ) no repetitive 15mer in the oligos. (b) Condition a +
800

Characterization of the non-unique probes A 70mer oligo probe set was prepared by applying 15-base contiguous match, low-complexity, RNA, selfannealing, and BLAST score lters. Each sequence was represented by one oligo probe. One unique oligo probe was designed for each sequence if the oligo passed these lters. Sequences that had no unique probes were represented by probes that cross-hybridized to other sequences. 14 554 sequences were represented by unique probes. Among the non-unique probes, the vast majority, 3196 probes, cross-hybridized to only one sequence. This probe fraction was further analyzed by keyword search in the sequence annotations for splice isoforms and very similar genes (gene families). Olfactory receptors were the largest gene family (173 members) among these 3196 sequences. The cross-hybridizing probes were categorized

Oligo probe selection

Additional Failed Sequences

8000

Probe Fraction

7000 6000 5000 4000 3000 2000 1000 0 a b c d e

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Splice Isoform All Sequences Olfactory Receptor

100% Crosshybridization
Screening Conditions

Partial Crosshybridization

Fig. 6. Effects of combining all screening lters. All probes are within a 10 C T m range (79 5 C) of each other. In addition, probes were selected under ve different conditions (a ) no repetitive 15mer in the oligos. (b) Condition a + low-complexity lter. (c) Condition b + RNA lter. (d ) Condition c + self-annealing lter. (e) Condition d + BLAST lter. The number of sequences that failed to generate unique probes increased as the screening stringency increased. Additional failed sequences are shown here as the result of increased screening stringency.

Fig. 7. Splice isoform vs. gene family in non-unique probes. There were 3196 probes that had only one cross-hybridizing sequence. This probe fraction was further analyzed by keyword search in the sequence annotations for splice isoforms and very similar genes (gene families). Olfactory receptors were the largest gene family among these sequences. The cross-hybridizing probes were categorized into two classes, 100% cross-hybridization and partial cross-hybridization.

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

into two classes, 100% cross-hybridization and partial cross-hybridization. Non-unique probes that represented splice isoforms, identied by sequence annotations, were mostly found to totally cross-hybridize to another sequence. On the other hand, high gene similarity usually resulted in partial cross-hybridization (Fig. 7). Further analysis showed that 1549 out of the 3196 non-unique probes cross-hybridized to sequences that were represented by unique probes. Random sampling indicated that the cross-hybridizing sequences were usually the longer form of the isoforms. Thus in general the shortest of a collection of splice isoforms is the hardest to design a unique oligo for. Our ndings indicate that the challenges posed by splice isoforms do not in general admit a solution based on border or splice junction oligos, given the design criteria and oligo length we have adopted. Instead we nd it necessary to identify splice isoforms by analysis of ensembles of oligonucleotides and interpretation of the pattern of hybridization. The concentration of the smallest splice isoform in an RNA sample will in general have to be calculated from the difference in signal intensity between a redundantly represented oligo that hybridizes to two or more species, and a unique oligo. In some cases, for example where many different isoforms are present, the accuracy of this determination is not likely to be high and other methods, such as quantitative PCR are likely to provide better estimates.

DISCUSSION Probe specicity Several papers have analyzed the requirements for designing gene specic oligo probes (Evertsz et al., 2001; Hughes et al., 2001; Kane et al., 2000; Miller et al., 2002). There are also a few programs available either commercially or as open source for probe design (Hughes et al., 2001; Pruitt et al., 2000; Lockhart et al., 1996; Rouillard et al., 2002; http://www.operon.com/arrays/omad.php). These programs usually use the BLAST program as the primary tool to locate gene specic regions where the probes are designed. In our algorithm, we rst addressed the importance of contiguous matches in the role of designing gene specic probes. A hashing technique, similar to that used in FASTA and BLAST, was adopted to provide rapid sequence similarity assessment. However instead of being used as seeds for alignment extensions as in BLAST, the 10mers were used as basic units to construct all possible 15mers in the data set (Fig. 1). In this way, repetitive 15mers could be quickly identied. The probes were also crosschecked later by BLAST screening. The occurrence of 15mer perfect matches strongly correlated with high BLAST scores. As shown in Figure 6, unique probes with below threshold BLAST score (<30) can be designed for most sequences that have passed the 15mer perfect match screening in the sample design universe. Because of the hashing technique used here, our program can screen 20 030 sequences in a few hours, much faster than other oligo picking programs that were implemented primarily with BLAST.
801

X.Wang and B.Seed

Probe sensitivity It is important that all the probes and the hybridizing regions of oligos be accessible under hybridization conditions. Several programs can predict secondary structures for small regions of nucleic acid based on free energy calculations (reviewed in Zuker, 2000). These programs have been used to design secondary-structure free probes (Rouillard et al., 2002). However an equally if not more relevant parameter is the secondary structure of the target analyte. If the sequences chosen for the oligo coincide with a region of self-complementarity in the mRNA, it is likely that hybridization efciency will be reduced. To avoid such sequences we search the complementary strand of the mRNA for similarity to sequences in the oligo, and reject those that exceed an empirically established threshold. In summary, we have designed an oligo picking algorithm and used it to design unique oligo probes for 14 554 mouse genes and 17 558 human genes after clustering redundant sequences. Probes that may cross-hybridize to more than one gene were also picked to represent the rest of the sequences in NCBI GenPept database. ACKNOWLEDGEMENTS We thank our colleagues in the Molecular Biology Core Facility for their contributions, Jonathan Urbach for an early version of the code for removing redundancies, and Mason Freeman for suggestions on oligo design. This research was supported by the PGA grant U01 HL66678 from the National Heart, Lung and Blood Institute. REFERENCES
Evertsz,E.M., Au-Young,J., Ruvolo,M.V., Lim,A.C. and Reynolds,M.A. (2001) Hybridization cross-reactivity within homologous gene families on glass cDNA microarrays. Biotechniques, 31, 1182, 1184, 1186. Gerhold,D., Rushmore,T. and Caskey,C.T. (1999) DNA chips: promising toys have become powerful tools. Trends. Biochem. Sci., 24, 168173. Hancock,J.M. and Armstrong,J.S. (1994) SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Comput. Appl. Biosci., 10, 6770. Hughes,T.R., Mao,M., Jones,A.R., Burchard,J., Marton,M.J., Shannon,K.W., Lefkowitz,S.M., Ziman,M., Schelter,J.M. and

Meyer,M.R. (2001) Expression proling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol., 19, 342347. Jacobs,K.A., Rudersdorf,R., Neill,S.D., Dougherty,J.P., Brown,E.L. and Fritsch,E.F. (1988) The thermal stability of oligonucleotide duplexes is sequence independent in tetraalkylammonium salt solutions: application to identifying recombinant DNA clones. Nucleic Acids Res., 16, 46374650. Kane,M.D., Jatkoe,T.A., Stumpf,C.R., Lu,J., Thomas,J.D. and Madore,S.J. (2000) Assessment of the sensitivity and specicity of oligonucleotide (50mer) microarrays. Nucleic Acids Res., 28, 45524557. Kierzek,R., Burkard,M.E. and Turner,D.H. (1999) Thermodynamics of single mismatches in RNA duplexes. Biochemistry, 38, 1421414223. Li,F. and Stormo,G.D. (2001) Selection of optimal DNA oligos for gene expression arrays. Bioinformatics, 17, 10671076. Lockhart,D.J., Dong,H., Byrne,M.C., Follettie,M.T., Gallo,M.V., Chee,M.S., Mittmann,M., Wang,C., Kobayashi,M. and Horton,H. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol., 14, 16751680. Miller,N.A., Gong,Q., Bryan,R., Ruvolo,M., Turner,L.A. and LaBrie,S.T. (2002) Cross-hybridization of closely related genes on high-density macroarrays. Biotechniques, 32, 620625. Mount,D.W. (2001) Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, New York City. Pruitt,K.D., Katz,K.S., Sicotte,H. and Maglott,D.R. (2000) Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet., 16, 4447. Pruitt,K.D. and Maglott,D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res., 29, 137140. Rouillard,J.M., Herbert,C.J. and Zuker,M. (2002) OligoArray: genome-scale oligonucleotide design for microarrays. Bioinformatics, 18, 486487. Schildkraut,C. (1965) Dependence of the melting temperature of DNA on salt concentration. Biopolymers, 3, 195208. Willems,v.D., Schroeder,H.W.J., Perlmutter,R.M. and Milner,E.C. (1989) Heterogeneity in the human Ig VH locus. J. Immunol., 142, 25472554. Wood,W.I., Gitschier,J., Lasky,L.A. and Lawn,R.M. (1985) Base composition-independent hybridization in tetramethylammonium chloride: a method for oligonucleotide screening of highly complex gene libraries. Proc. Natl Acad. Sci. USA, 82, 1585 1588. Wootton,J.C. and Federhen,S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol., 266, 554571. Zuker,M. (2000) Calculating nucleic acid secondary structure. Curr. Opin. Struct. Biol., 10, 303310.

Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on March 20, 2013

802

Вам также может понравиться