Вы находитесь на странице: 1из 7

progress

A genomic view of alternative splicing


Barmak Modrek & Christopher Lee
Recent genome-wide analyses of alternative splicing indicate that 4060% of human genes have alternative splice forms, suggesting that alternative splicing is one of the most signicant components of the functional complexity of the human genome. Here we review these recent results from bioinformatics studies, assess their reliability and consider the impact of alternative splicing on biological functions. Although the big picture of alternative splicing that is emerging from genomics is exciting, there are many challenges. High-throughput experimental verication of alternative splice forms, functional characterization, and regulation of alternative splicing are key directions for research. We recommend a community-based effort to discover and characterize alternative splice forms comprehensively throughout the human genome.

2002 Nature Publishing Group http://genetics.nature.com

Introduction
The sequencing of the human genome has raised important questions about the nature of genomic complexity. It was widely anticipated that the human genome would contain a much larger number of genes (estimates based on expressed-sequence clustering ran as high as 150,000 genes) than Drosophila (14,000 genes) or Caenorhabditis elegans (19,000 genes)13. The report of only 32,000 human genes thus came as a surprise4,5. This basic disparity indicated that the number of human expressedsequence (mRNA) forms was much higher than the number of genes, suggesting a major role for alternative splicing in the production of complexity. Many groups have recently presented genomic analyses of alternative splicing that strongly support this hypothesis, raising intriguing questions about the identication, functional roles and regulation of alternative splice forms across the whole genome. The study of alternative splicing has long been a valuable subeld of molecular biology, but has received comparatively little attention compared with major elds such as the discovery of new genes or transcriptional regulation. Only several hundred alternatively spliced genes have been identied so far by molecular biologists (see Table 1 for database resources). After the discovery of exons and introns in the Adenovirus hexon gene in 1977 (ref. 6), Walter Gilbert proposed that different combinations of exons could be spliced together (alternative splicing) to produce different mRNA isoforms of a gene7. By the early 1980s, alternative splicing was well documented in several genes8,9, and researchers estimated that 5% of genes in higher eukaryotes might have alternative splicing10. A range of processes from sex determination to apoptosis use alternative splicing11,12. Its regulatory mechanisms have recently been discovered in several genes11,13.
Genome-scale analyses of alternative splicing High-throughput sequencing of the human genome and especially of expressed sequence tag (EST) sequences has enabled a completely different approach based on bioinformatics. Because ESTs are derived from fully processed mRNA (after 5 capping, splicing and polyadenylation), they provide a broad sample of mRNA diversity. This diversity can be analyzed computationally.

In the last two years, bioinformatics studies have identied an order of magnitude more alternatively spliced genes than were found in the past 20 years and are beginning to provide a global view of alternative splicing in humans. We will rst describe these studies and then assess the evidence. Bioinformatics approaches. Most bioinformatics studies4,1418 (Table 2) rely on identifying ESTs that come from the same gene and looking for differences between them that are consistent with alternative splicing, such as a large insertion or deletion in one EST (Fig. 1a). Each candidate splice can be further assessed by aligning the ESTs exactly to their gene sequence in the draft genome (Fig. 1b). This reveals candidate exons (matches to the genomic sequence) separated by candidate splices (large gaps in the EST-genomic alignment; Fig. 1b). As intronic sequences at splice junctions are highly conserved (99.24% of introns have a GT-AG at their 5 and 3 ends, respectively), they can be used to verify candidate splices19. In the earliest large-scale discovery of new alternative splicing, Mironov et al.14 aligned ESTs to genomic sequence for 392 known genes and found alternative splicing in 133 of these genes14. Croft et al.15 took a different approach that did not rely on aligning ESTs to the complete genomic sequence: they created a database of individual intron sequences annotated in GenBank and searched for EST sequences that matched intronic sequence. They found matches to introns from 582 genes, suggesting an alternative splice. Brett et al.16 looked for insertions or deletions in ESTs relative to a set of known mRNAs, indicative of alternative splices, but without EST alignment to the genomic sequence. This work identied 3,011 alternatively spliced genes16. The International Human Genome Sequencing Consortium reported 145 alternatively spliced genes from a comprehensive analysis of chromosome 22 based on aligning ESTs to the genomic sequence4. Modrek et al.18 aligned available human EST and mRNA sequences (2.1 million) to the whole draft genome, applying strict matching, splice site and alternative splice detection criteria, to identify 6,201 alternative splices in 2,272 genes. Alternative splicing frequency. These studies have consistently reported a high rate of alternative splicing in the human genome, with 3559% of human genes showing evidence of at

Departments of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, California 90095-1570, USA. Correspondence should be addressed to C.L. (e-mail: leec@mbi.ucla.edu).
nature genetics volume 30 january 2002 13

progress
Table 1 Description and URLs for some alternative splicing databases
Resource Description URL

Literature-based alternative splicing databases ASDB36 AsMamDB37 Alternative Splicing Database30 alternative splicing database using Genbank and SWISS-PROT annotation database of alternative splices in human, mouse and rat database of alternative splices from literature database of introns in yeast http://cbcg.nersc.gov/asdb http://166.111.30.65/ASMAMDB.html http://cgsigma.cshl.org/new_alt_exon_db2/ http://www.cse.ucsc.edu/research/compbio/yeast_introns.html

2002 Nature Publishing Group http://genetics.nature.com

Yeast Intron Database38

New alternative splicing discovery databases The Intronerator39 ISIS15 alternative splicing in C. elegans based on analysis of EST data Intron Sequence Information System has section covering detected human alternative splices Transcript Assembly Program result of alternative splicing database of alternative splices detected in human EST data http://www.cse.ucsc.edu/kent/intronerator http://isis.bit.uq.edu.au/

TAP17 HASDB18

http://stl.wustl.edu/zkan/TAP/ http://www.bioinformatics.ucla.edu/HASDB

least one alternative splice form4,14,1618. Moreover, given that only a few ESTs have been sequenced for most genes, it seems possible that even more alternative splicing exists that is not yet detectable in the available ESTs. These studies indicate that alternative splicing is far more abundant, ubiquitous and functionally important than previously thought. And there are more types of mRNA isoforms. For example, bioinformatics studies have reported that about 25% of genes have alternative polyadenylation forms, that is, mRNAs that are cleaved and polyadenylated at different sites4,20. Functional impact. How do these newly discovered alternative mRNA forms affect protein function? Despite an early report that most alternative splices occur within the 5 untranslated region14, recent studies indicate that 7088% of alternative splices change the protein product4,17,18. The majority of these changes appear to be functionally interesting, such as replacement of the amino or carboxy terminus, or in-frame addition and removal of a functional unit (Fig. 2b)18. Only 19% of the alternative protein forms were shortened due to frameshift18. Fig. 2c shows an alternative isoform of a new FC receptor -like protein, whose C-terminal

transmembrane domain (TM) and cytoplasmic tail (important for signal transduction in this class of receptors) is neatly replaced with a new TM domain and tail by alternative polyadenylation18. What is the functional pattern of alternative splicing across the genome? A random sample of 50 alternatively spliced genes showed that over three-quarters were involved in signaling and regulation (such as receptors, signal transduction, transcription factors, and so on). Moreover, the systemic categories most highly represented in this sample were genes specic to the immune and nervous systems18. This should be interpreted cautiously, as the overall breakdown of gene functions in the whole genome is still unclear. However, alternative splicing may be most important in complex systems where information must be processed differently at different times (such as immune tolerance, or development) or a very high level of diversity is required (such as axonal guidance). Notable examples of combinatorial alternative splicing of multiple cassettes of exons, generating up to 40,000 isoforms of a single gene, have recently been discovered in the nervous system, including Dscam (axonal guidance receptor in Drosophila) and neurexin (neuropeptide receptor)21.

Table 2 Summary of some recent large scale alternative splicing papers


Paper Mironov et al. (1999)14 Summary Used TIGR Human Gene Index. ESTs were aligned to genomic and the genomic was used to create superstructures of EST clusters. Identied 133 alternatively spliced genes. Estimated at least 35% of human genes are alternatively spliced. Aligned EST to mRNA to identify insertion and deletions. Identied 3,011 alternatively spliced genes. Veried 16 of the 20 genes with predicted alternative splicing. Estimated at least 38% of human genes are alternatively spliced. Created a database of Introns from GenBank. Identied EST sequences that matched region previously only designated as intronic. Identied 582 alternatively spliced genes. Estimated at least 22% of human genes are alternatively spliced. Reconstructed the alignment of 642 different transcripts (mRNA and EST) covering 245 genes on chromosome 22 genomic sequence. 145 genes had alternative splicing. Estimated at least 59% of human genes are alternatively spliced. Aligned EST and genomic sequence to create transcript. Analysis of conservation of alternative splicing between human and mouse. Identied 374 alternative spliced genes. Estimated 55% of human genes are alternatively spliced. Used UNIGENE. ESTs were aligned to Genomic and alternative splices were detected. Identied 6,201 alternative splice relationships in 2,272 clusters. Provides functional analysis of alternative splicing. Estimated at least 42% of human genes are alternatively spliced.

Brett et al. (2000)16

Croft et al. (2000)15

Human Genome Consortium (2001)4

Kan et al. (2001)17

Modrek et al. (2001)18

14

nature genetics volume 30 january 2002

progress
a
mRNA AAA... AAA... AAA...
Fig. 1 Computational identication of alternative splicing. a, Insertion and deletion in ESTs relative to mRNA are identied as potential alternative splices. b, Splices are identied and intronic splice junction donor and acceptor sites are checked. Alternative splices are detected when two splices are mutually exclusive (intron inclusions are not identied as alternative splices).

ESTs
perfect matches
candidate alternative splice

insert

2002 Nature Publishing Group http://genetics.nature.com

b
EST gap boundaries match known splice site patterns
donor acceptor d

short internal exons


a d GT a AG

genomic
perfect matches to genomic exons EST gaps match genomic introns

GT

AG

GT AG

insert matches exon boundaries

Bioinformatics evidence for alternative splicing It is essential that biologists understand the forms of evidence and problems that underlie this new big picture view of alternative splicing. Bioinformatics is an automated analysis of high-throughput experimental data and follows a very different process than traditional molecular biology. It can be simultaneously more rigorous (much more detailed, mathematical measures of evidence are required for a computer to do this analysis at all) and much less rigorous (bioinformaticists typically cannot order a new set of experimental tests for all the isoforms they detect, as is common in

molecular biology labs studying a specic isoform). Two kinds of problems must be distinguished: (i) a false negative, the d a failure to detect a real splice form, and (ii) a false positive, a GT AG reported result that is not a true, AAA... functional splice form. Analyzing the causes of these problems AAA... during cDNA library construction, EST sequencing and sequence comparison suggests EST poly-A tails confirms 3' exon terminus many interesting questions for the next stage of this research (Table 3). Detection of alternative splicing through bioinformatics depends on nding deviant EST forms within the mass of data produced by undirected EST sequencing, raising a fundamental question: when an analysis is used to look for some form of deviation in a very large data set, other causes of deviation, even if infrequent, could add up to a substantial fraction of the result. How can we be sure this is real alternative splicing? The bioinformatics studies have tried carefully to screen out many possible sources of false positives. Simple forms of EST deviation, such as random variation in where a given EST sequence begins or ends within a gene, and potential vector
long 3 terminal exon

a
exon inclusion/exclusion

b
alternative N terminus (alternative initiation)

alternative 5 site alternative 3 site

UAA

UAA

alternative C terminus (truncation/extension)

in-frame insertion/deletion

Fig. 2 Types of alternative splicing and possible effects on protein. a, Alternative splicing can lead to either the inclusion or exclusion of an exon, use of a different 5 site, or use of a different 3 site. b, Alternative splicing can lead to use of a different site for translation initiation (alternative initiation), a different translation termination site due to a frameshift (truncation or extension), or the addition or removal of a stop codon in the alternative coding sequence (alternative termination). Alternative splicing can also change the internal region because of an in-frame insertion or deletion. c, Alternative splicing of Hs.11090, a putative FC receptor chain homolog: genomic structure and two alternative spliced (and polyadenylated) mRNA forms. The differential RNA processing results in substitution of one transmembrane domain instead of another. However, one form has a different cytoplasmic tail (involved in signaling in this family), whereas the other does not.

genomic

Poly A

Poly A

exon I

IIa,b

III

IV

Va,b

VI

VII

inferred mRNA and protein I IIa IIb III TM I IIa IIb III TM IV TM IV TM TM TM Va Va VI TM Vb TM VII

AAAA
N C

alternative polyadenylation
AAAA

nature genetics volume 30 january 2002

15

progress
contamination at the ends of ESTs, are excluded. The most important screen is provided by mapping (aligning) ESTs to the draft human genome sequence. Chimeric ESTs can be easily excluded by requiring that each EST align completely to a single genomic locus. The genomic location found by homology search and alignment can often be checked against radiation hybrid mapping data. As the genomic regions that match the ESTs should be exons and the alignment gaps between them should be introns, the putative splice sites at their boundaries can be carefully checked. Because the splice-site motifs (GT-AG, polypyrimidine tract, and so on) are primarily in the intron, this provides a validation that is independent of the EST evidence. Reverse transcriptase artifacts or other problems causing imperfect cDNA construction may be screened out in this way. Improper inclusion of genomic sequence in ESTs (due to either mRNA purification problems or incomplete splicing) can also be excluded by requiring pairs of mutually exclusive splices in different ESTs. Observing a given splice in one EST but not in a second EST may be insufficient, because the latter could be an un-spliced EST rather than a biologically significant intron inclusion. This problem can be eliminated by focusing on mutually exclusive splices, two different splices seen in different ESTs, that overlap in the genomic sequence. One can make this even stricter by requiring that the two splices share one splice site but differ at the other. This approach detects the classic forms of alternative splicing, such as alternative exon usage and alternative 5 or 3 splicing (Fig. 2a). Detection of valid intron inclusions will probably require further statistical analysis. The presence in the human genome of many pseudogenes and paralogous genes resembling other genes greatly complicates the problem. Correct alternative splice detection depends on clustering the EST data into separate groups representing individual genes. EST clustering (such as UniGene) is well known to have both excessive splitting of genes (there are 80,000 UniGene clusters, versus the estimate of 32,000 human genes) and excessive lumping, in which paralogous gene sequences are mixed together4,22. This mixing can suggest spurious alternative splices that are actually just differences between similar but distinct genes23. Methods that map the ESTs onto genomic sequence with a high level of identity (9598%) probably exclude much of this paralog mixing, but not all. Ultimately, mapping ESTs to their unique gene location in the genomic sequence is the only way to sort out paralogs. Requiring that the consensus sequence for an EST cluster match completely, over its full length, to its genomic contig can help exclude artifacts where the genomic sequence has been misassembled. Instead of getting false positives (incorrect alternative splices), this may cause false negatives due to refusing to map the EST cluster at all. A high rate of false negatives is the greatest disadvantage of methods that require mapping ESTs to the draft genome sequence. Despite these sources of uncertainty, the agreement among many studies on a high frequency of alternatively spliced genes (3560%) suggests that this result is valid. These studies support

2002 Nature Publishing Group http://genetics.nature.com

Table 3 Types of problems inherent in high-throughput alternative splice detection


Experimental factors EST coverage limitations, bias RT / PCR artifacts Genomic coverage, assembly errors Chimeric ESTs Genomic contamination EST orientation error, uncertainty Sequencing error EST fragmentation Bioinformatics factors Mapping ESTs to the genome Paralogous genes Rigorous measures of evidence Arbitrary cutoff thresholds Alignment size limitations Pathological assemblies Nonstandard splice sites? Alignment degeneracy Biological interpretation factors Spliceosome errors? What is truly functional? () in methods that map genomic location for each EST. (+) in all current methods, but mostly in those that dont map genomic location or dont check all possible locations. (+/). How can the strength of experimental evidence for a specic splice form be measured rigorously? (+/) in methods that use cutoffs (such as 99% identity). () in methods that cant align >102, >103 sequences. (+/). What should assembly programs do when the assembled reads disagree in regions (such as alt. splicing)? Programs vary. (+) in methods that dont fully check splice sites; () in methods that do restrict to standard splice sites. (+/). Alignment of ESTs to genomic is frequently degenerate around splice sites. Comments Is splicing perfect? That is, does it only make correct forms? Just because a splice form is real (i.e. present in the cell) doesnt mean its biologically functional. Conversely, even an mRNA isoform that makes a truncated, inactive protein might be a biologically valid form of functional regulation. Predicting ORF in newly discovered genes; splicing may change ORF. Motif, signal, domain prediction, and functional effects. Knowledge about effects of mRNA stability, localization, and other possible UTR sequences is incomplete. Our genome-wide view of function is under construction. Until then, we have unknown selection bias. Type of errors caused: (+) false positive () false negative (). Most genes have very few ESTs, from even fewer tissues. The main barrier to alternative splice detection. (+) in methods that dont screen for fully valid splice sites (which requires genomic mapping, intronic sequence). () in methods that map ESTs on the genome. Short contigs may cause >25% false negatives. (+) in methods that simply compare ESTs. (+) in methods that dont screen for pairs of mutually exclusive splices. (+/) in methods that dont correct misreported orientation, or dont distinguish overlapping genes on opposite strands. (+/). Single-pass EST sequencing error can be very high locally (e.g. >10% at the ends). Need chromatograms. Where ESTs end cannot be treated as signicant.

Dening the coding region Predicting impact in protein Predicting impact in UTR Assessing and correcting for bias

We have listed items in each category loosely in descending order of importance.

16

nature genetics volume 30 january 2002

progress
a
PCR-based detection
1 3 PCR primers

microarray-based detection A 1 3 probes (60 nt)

C 2 3

2002 Nature Publishing Group http://genetics.nature.com

kidney

brain

microarray
A B

tissue samples
brain

250 nt 1 2 150 nt 1 3 3 C

lung

Fig. 3 Experimental analysis of alternative splicing. a, Alternative splicing can be veried by RTPCR using primers that ank the alternatively spliced region. The relative abundance of different isoforms in various tissue sample can be assessed from gel. b, High-throughput identication of alternative splicing can be carried by using microarrays. The microarray probes would consist of exonexon junction sequence, as different alternative splice forms will have different exonexon junctions. By analyzing the tissue distribution of various splice forms, clues regarding the regulation of alternative splicing can be obtained.

each other persuasively, because they differ not only in the sets of genes sampled (ranging from well-characterized mRNAs, to specic chromosomes, to a whole-genome study), but also in their specic criteria for reporting an alternative splice. It is important, however, to emphasize that there has only been one study so far verifying alternative splices detected by bioinformatics. Twenty genes with putative alternative splices were amplied from a multiple tissue cDNA panel by RTPCR, with primers anking the alternative splice (Fig. 3a). Sixteen were conrmed to be alternatively spliced, although thirteen of them were already recognized in the literature16.

Future Challenges High-throughput validation. Large-scale experimental verication of alternative splicing will be needed to assess the accuracy of the bioinformatics-based analyses. One promising technology is inkjet printing of long probes (up to 60 nt) to make rapidly customizable microarrays. Shoemaker et al.24 used this technology to monitor the coordinate expression of 8,183 exons annotated on chromosome 22q. This technology could easily be adapted to detect alternative splicing, by designing probes that span specic exonexon junctions. As alternative splicing of a given gene creates different exonexon junctions, it can be detected by meaEST, mRNA sequencing suring hybridization of mRNA samples from microarrays different tissues to these probes (Fig. 3b). mass spectrometry Whereas the hybridization ratios of most exonexon junction probes for a given gene will be constant, alternative splicing will cause some

junctions to be up- or downregulated in different tissues. Rapid printing of such splicing lung chips will enable cataloging of splice forms for all genes, in different tissues, developmental states and conditions. Comkidney bined with the human genome sequence, this data can in turn be used to identify cis elements that regulate these forms. Recently, the Affymetrix microarray design has also been used to identify potential alternative splices within the rat genome. The Affymetrix array uses 20 probe pairs (25 nt) representing different exons of a gene. Whereas the intensities of most probes for a gene varied together in different tissues, probes for certain exons were anomalously depressed in some tissues, indicating potential alternative splices25. Other methodologies that use microarray technology to assess alternative splicing have also been developed (X.-D. Fu and M. Ares Jr, personal communication). Rigorous measures of evidence. It should be emphasized that microarray approaches will not settle the question of identifying alternative splices independent of bioinformatics analysis. If anything, these data are likely to increase the need for bioinformatics, to measure rigorously the strength of the evidence for alternative splices in all the raw experimental data (ESTs, microarrays, and so on). For example, the original inkjet microarray paper treated differences in probe hybridization

bioinformatics
gene structures alternative mRNA isoforms tissue, disease specificity predicted functional impact suggested experimental tests

ASAP community annotation DB


mRNA, protein isoform detection functional characterization isoforms of interest

Fig. 4 Cooperative roles for bioinformatics and experimentation in an alternative splicing annotation project (ASAP). Individual researchers interested in specic genes can nd computationally derived alternative splices in the database, allowing them to characterize and report results back into the database. High-throughput experiments can be designed using the database, and after computational analysis of the experimental data, the database can be updated with new results.

validation probe design

high throughput experimental data production

individual experiments

nature genetics volume 30 january 2002

17

progress
among exons in a gene as indicators that low-expressed probes were not real exons but simply gene prediction errors. By contrast, the Affymetrix study treated such differences as evidence of alternative splicing. The assessment of both competing interpretations is a bioinformatics analysis problem. This will require moving beyond simple rules for ltering out potentially misleading data to probabilistic measurement of the relative strength of the evidence for the competing interpretations. Cataloguing alternative splice forms. Although the new bioinformatics results are based on data from the whole genome, it is important to understand they are highly incomplete. They detect many new splice forms but miss many known isoforms. This is a result of both the incompleteness and fragmentation of the EST and genomic sequence data, as well as many causes of false negatives in the bioinformatics methods (Table 3). In Modrek et al.18, at least 50% of the EST data (and their potential alternative splices) were excluded by these problems. These studies are just the beginning of an accelerating process of mRNA isoform discovery. The EST sequence data are growing rapidly, the draft genome sequence is being completed and new streams of high-throughput data (such as splice-detection microarrays) are beginning. Thus, a worthwhile goal is simply to build a catalog of alternative splice forms, just as the human genome sequence is being used to build a catalog of the genes. The development of new high-throughput technologies for detecting the protein products of alternative splicing will be needed to streamline this process. What is truly functional? Although bioinformatics and highthroughput experiments can have a key role in building a catalog, in our view this can only succeed as a community annotation process involving all molecular biology researchers. For example, how can one prove that a particular splice form is actually carrying out an important biological function? Even with strong evidence that a form is real (that it was actually made by the spliceosome in a living cell), it does not seem safe to assume that it has a biological function. If the spliceosome had a 0.1% rate of mis-splicing, it could produce over 4,000 meaningless alternatively spliced ESTs among the approximately 4 million ESTs. Bioinformatics can partly address this by discerning that a large subset of alternative splice forms (47%) are observed in multiple ESTs (often from different libraries) and thus are unlikely to be low-frequency error products18. At the same time, it is also not safe to dismiss a given form as functionless simply because it has no obvious function. For example, even an alternative splice form that causes early translational termination (and an inactive protein product) can act as an important form of regulation of biological activity13. Only detailed functional studies can resolve these questions. Bioinformatics can infer likely functional impacts, however, by detecting the addition or removal of known domains, and can predict how experimenters could verify the presence of these forms and their likely disease or tissue specicity. Biologists interested in some of these putative forms could then use a variety of techniques (PCR, northern and western blots) to test these predictions. This process will be best served by a central repository for both the bioinformatics predictions and subsequent experimental verication and functional studies, which would act as a community annotation database (Fig. 4). We hope this process can evolve rapidly into an active partnership between prediction and experiment. Alternative splicing regulation. One intriguing new area is the study of alternative splicing regulation. Regulation of splicing could be involved in 15% of genetic diseases26 and may contribute to cancer by mis-splicing of exon 18 in BRCA1, which is caused by a polymorphism in an exonic enhancer27. If alternative
18

2002 Nature Publishing Group http://genetics.nature.com

splicing is as widespread as bioinformatics studies indicate, how different splice forms are turned on and off may become a major research area, like transcriptional regulation. So far, molecular biology has identified some cis regulatory elements (such as exonic splicing enhancers) and trans factors (SR proteins, PTB, and so on)11,13. Bioinformatics could make important contributions, for example, in the identification of cis regulatory elements2831. Recently, Brudno et al.31 analyzed intronic sequence upstream and downstream of 25 alternatively spliced brain specific exons. They detected the motif UGCAUG at a much higher frequency downstream of alternatively spliced exons (relative to constitutive exons), for both brain-specific and muscle-specific alternative splicing31. This motif had previously been implicated in the alternative splicing of several genes including c-src, fibronectin, calcitonin/CGRP, and nonmuscle myosin II heavy chain-B3235, so this result is very suggestive. It bodes well for genome-wide studies that combine the flood of new alternative splicing data with complete genome sequences for multiple organisms.
Acknowledgments

We are grateful to D. Black, S. Galbraith and K. Ke for their critical comments and suggestions. C.L. was supported by a grant from the Department of Energy. B.M. was supported by National Science Foundation Integrative Graduate Education and Research Training award. Received 16 August; accepted 20 November 2001.
1. 2. 3. 4. 5. 6. 7. 8. 9. Pennisi, E. Human genome project: and the gene number is...? Science 288, 11461147 (2000). Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 21852195 (2000). The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 20122018 (1998). International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860921 (2001). Venter, J.C. et al. The sequence of the human genome. Science 291, 13041351 (2001). Sambrook, J. Adenovirus amazes at Cold Spring Harbor. Nature 268, 101104 (1977). Gilbert, W. Why genes in pieces? Nature 271, 501 (1978). Early, P. et al. Two mRNAs can be produced from a single immunoglobulin m gene by alternative RNA processing pathways. Cell 20, 313319 (1980). Rosenfeld, M.G. et al. Calcitonin mRNA polymorphism: peptide switching associated with alternative RNA splicing events. Proc. Natl Acad. Sci. USA 79, 17171721 (1982). Sharp, P.A. Split genes and RNA splicing. Cell 77, 805815 (1994). Lopez, A.J. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 32, 279305 (1998). Boise, L.H. et al. bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell 74, 597608 (1993). Smith, C.W.J. & Valcarcel, J. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends. Biochem. Sci. 25, 381388 (2000). Mironov, A.A., Fickett, J.W. & Gelfand, M.S. Frequent alternative splicing of human genes. Genome Res. 9, 12881293 (1999). Croft, L. et al. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nature Genet. 24, 340341 (2000). Brett, D. et al. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 474, 8386 (2000). Kan, Z., Rouchka, E.C., Gish, W.R. & States, D.J. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11, 889900 (2001). Modrek, B., Resch, A., Grasso, C. & Lee, C. Genome-wide analysis of alternative splicing using human expressed sequence data. Nucleic Acids Res. 29, 28502859 (2001). Burset, M., Seledtsov, I.A. & Solovyev, V.V. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28, 43644375 (2000). Beaudoing, E., Freier, S., Wyatt, J.R., Claverie, J. & Gautheret, D. Patterns of variant polyadenylation signal usage in human genes. Genome Res. 10, 10011010 (2000). Graveley, B.R. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17, 100107 (2001). Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 28, 1014 (2000). Burke, J., Wang, H., Hide, W. & Davison, D.B. Alternative gene form discovery and candidate gene selection from gene indexing projects. Genome Res. 8, 276290 (1998). Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922927 (2001). Hu, G.K. et al. Predicting splice variant from DNA chip expression data. Genome Res. 11, 12371245 (2001). Krawzczak, M., Reiss, J. & Cooper, D.N. The mutational spectrum of single basepair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum. Genet. 90, 4154 (1992).

10. 11. 12. 13. 14. 15. 16. 17.

18.

19. 20.

21. 22. 23.

24. 25. 26.

nature genetics volume 30 january 2002

progress
27. Liu, H.X., Cartegni, L., Zhang, M.Q. & Krainer, A.R. A mechanism for exon skipping caused by nonsense or missense mutations in BRCA1 and other genes. Nature Genet. 27, 5558 (2001). 28. Stamm, S., Zhang, M.Q., Marr, T.G. & Helfman, D.M. A sequence compilation and comparison of exons that are alternatively spliced in neurons. Nucleic Acids Res. 22, 15151526 (1994). 29. Kent, W.J. & Zahler, A.M. Conservation, regulation, synteny, and introns in a large-scale C. briggsaeC. elegans genomic alignment. Genome Res. 10, 11151125 (2000). 30. Stamm, S. et al. An alternative-exon database and its statistical analysis. DNA Cell Biol. 19, 739756 (2000). 31. Brudno, M. et al. Computational analysis of candidate intron regulatory elements for tissue-specic alternative pre-mRNA splicing. Nucleic Acids Res. 29, 23382348 (2001). 32. Modafferi, E.F. & Black, D.L. A complex intronic splicing enhancer from the c-src pre-mRNA activates inclusion of a heterologous exon. Mol. Cell. Biol. 17, 65376545 (1997). 33. Huh, G.S. & Hynes, R.O. Regulation of alternative pre-mRNA splicing by a novel repeated hexanucleotide element. Genes Dev. 8, 15611574 (1994). 34. Hedjran, F., Yeakley, J.M., Huh, G.S., Hynes, R.O. & Rosenfeld, M.G. Control of alternative pre-mRNA splicing by distributed pentameric repeats. Proc. Natl Acad. Sci. USA 94, 1234312347 (1997). 35. Kawamoto, S. Neuron-specic alternative splicing of nonmuscle myosin II heavy chain-B pre-mRNA requires a cis-acting intron sequence. J. Biol. Chem. 271, 1761317616 (1996). 36. Dralyuk, I., Brudno, M., Gelfand, M.S., Zorn, M. & Dubchak, I. ASDB: database of alternatively spliced genes. Nucleic Acids Res. 28, 296297 (2000). 37. Ji, H. et al. AsMamDB: an alternative splice database of mammals. Nucleic Acids Res. 29, 260263 (2001). 38. Spingola, M., Grate, L., Haussler, D. & Ares, M.J. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cervisiae. RNA 5, 221234 (1999). 39. Kent, W.J. & Zahler, A.M. The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans. Nucleic Acids Res. 28, 9193 (2000).

2002 Nature Publishing Group http://genetics.nature.com

nature genetics volume 30 january 2002

19

Вам также может понравиться