Вы находитесь на странице: 1из 23

TUTORIAL #1

- GENE -

Analytical Methods
in Molecular Biology
Department of Biology
Instructor: JD Spafford
BIOL 208 - MOBIOSIMLAB
Version1.3 - Spring 2009

In this assignment, you will identify the structures of genes, mRNA transcripts and proteins using the National
Center for Biotechnology Information website:

A. EXPLORING ENTREZ GENE


Go to the National Center for Biotechnology Information website:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene

press GO
when finished

At NCBI Entrez Gene, choose Limits TAB:


Under Include Only: choose RefSeq
Under Limited by REfSeq Status: choose Validated
Under Limited by Taxonomy: choose Homo sapiens
Enter your Gene Query in the Search gene for box: type in syntaxin, press Go
Scroll down and look for:

Press STX1A

1- GENE TUTORIAL

Tutorial 1 - Identifying the structure of genes, mRNA and


proteins from a genetic database

Scroll down until you come to the NCBI Reference Sequences (RefSeq) section:
NM_004603.2 is the accession number for the official reference mRNA of human syntaxin1A
NP_004594.1is the accession number for the official reference protein of human syntaxin1A
NC_000007.12 is the accession number for the official reference genomic of human syntaxin1A

1- GENE TUTORIAL

You are lead to the Entrez Gene webpage which provides a description of Gene ID: 6804. The official gene name:
syntaxin1A the Gene ID 6804 and Official Symbol STX1A are synonymous for the same gene. You can type
entrez gene 6804 or entrez gene STX1A into a google search (www.google.com) and come to the same entrez
start page for syntaxin1A.

Below the genomic assemblies are over a dozen entries under Related Sequences. These are links to Genomic
and mRNA sequences or their corresponding translated protein sequences submitted by various scientists. Any
scientist can submit sequences obtained in their experiments to the public GenBank database. Sequences under
Related Sequences may be identical to the reference sequences (RefSeq) but many are incomplete sequences.
Often entries have mistakes (not likely on purpose), and could be obtained from different human tissue types or
stage in development or derived from sequencing templates obtained using different molecular approaches (PCR
products, ESTs, cDNA ibrary sequences, etc.). Reference sequences (RefSeq) are complete sequences that have
been validated by NCBI (National Center for Biotechnology Information). The public sequence database is open
access, and their is no policing of sequence entries outside the reference sequences (RefSeq).

B. EXPLORING mRNA SEQUENCES IN ENTREZ NUCLEOTIDE


Press NM_004603.2 to open the RefSeq for human syntaxin1A mRNA

The header at the top identifies that you are viewing an Entrez Nucleotide entry. This webpage will illustrate all
the details relating to the human syntaxin1A mRNA which is identified by Gene accession: NM_004603.2. Later,
we will be looking at Entrez Protein which provides information regarding to the translated protein, which has its
own Gene accession number.
The header lists some of the general identifying features of the gene such as syntaxin1A is a linear mRNA of 2117
bp in length from source Homo sapiens (human). It is a complete gene that was last updated on 15-Mar-2009.

syntaxin 1A mRNA is 2117


bp mRNA with accession #:
NM_004603.2

Scrolling farther down,the features of the gene are listed. The gene or mRNA is between 1 and 2117 bp. The CDS
or coding sequence of the mRNA is located between base pairs numbered 28 to 894. The translated protein for
the human syntaxin1A gene is identified as protein id: NP_004594.1 coding for the 288 amino acid protein, identified as: /translation=MKDRT ... GGIFA.

1- GENE TUTORIAL

Below the reference sequences (RefSeq) for mRNA, protein and Genomic assembly, are alternate Genomic assembies, derived from well known and large-scale genome sequencing projects: CRA_TCAGchr7v2, Celera and
HuRef.

1- GENE TUTORIAL

mRNA between
positions 1 bp
and 2117 bp

DNA coding
sequence between
position 28 bp and
894 bp

protein ID for translated


amino acid sequence

amino acid translation


of coding sequence

Notice that many of the features are hyperlinked and will take you to the webpages that are specific for that
feature. As an example, let us narrow our observations within the STX1A gene, by just examining the coding sequence or CDS, which is between 28 bp and 894 bp of the 2117 bp mRNA. Click on the CDS hyperlink.

only the CDS of


STX1A mRNA is
shown

only the 867 bp CDS


of the syntaxin 1A
gene is illustrated

protein NP_004594.1
coded by CDS from 1 bp
to 867 bp

1- GENE TUTORIAL

Notice that the CDS link from the mRNA file only shows that features related to the 1 to 867 bp of the coding
sequence:

ORIGIN

//

1
61
121
181
241
301
361
421
481
541
601
661
721
781
841

atgaaggacc
gctgtcaccg
cgaggcttca
atcctggcat
gacataaaga
gagcaagagg
tccacgctgt
taccgcgagc
accagtgagg
atcatcatgg
gagatcatca
atgctcgtgg
gtagactatg
gcgcgccgga
tccactgttg

gaacccagga
tggaccgaga
ttgacaagat
cccccaaccc
agacagcaaa
aaggcctgaa
ccagaaagtt
gctgcaaagg
agctggagga
actccagcat
agctggagaa
agagccaggg
tggagagggc
agaaaatcat
ggggcatctt

gctccgcacg
ccgcttcatg
cgcagagaac
cgatgagaag
caaagttcgt
ccgctcctcc
tgtggaggtc
ccgcatccag
catgctggag
ctcgaagcag
cagcatccgt
agagatgatt
cgtgtctgac
gatcatcatc
cgcctag

gccaaggaca
gatgagttct
gtggaggagg
acgaaggagg
tccaagttaa
gctgacctga
atgtcggagt
aggcagctgg
agtgggaacc
gctctgagcg
gagctacacg
gacaggatcg
accaagaagg
tgctgtgtga

gcgatgatga
ttgagcaggt
tgaagcggaa
agctggaaga
agagcatcga
ggatccggaa
acaacgccac
agatcaccgg
ccgccatctt
agattgagac
acatgttcat
agtacaatgt
ccgtcaagta
tcctgggcat

tgatgatgtc
ggaggagatt
gcacagtgcc
actcatgtcc
gcagtccatc
gacacagcac
gcagtccgac
caggaccacg
tgcctctggg
gcggcacagt
ggacatggcc
ggaacacgcg
ccagagcaag
cgtcatcgcc

Let us now go back to the complete entry for STX1A mRNA, accession number NM_004603.2. Scroll to the top of
the screen where the Entrez Nucleotide header is located, look to the right side and adjust the Change Region
Shown panel. Select Whole sequence and press Update View. Now you are back to the STX1A mRNA sequence
of the full length mRNA.

press here
for full length
mRNA

1- GENE TUTORIAL

Scrolling down further towards the end of the Entrez Nucleotide entry for the CDS of syntaxin1A, is the nucleotide sequence. Notice the first three bp are the start codon for the CDS, atg. The last three bp (865 bp to 867 bp)
is the stop codon or tag .

1-27 is 5UTR (5 untranslated region)


28-30 codes for the start codon atg
28-894 codes for the 288 amino acid coding region
892-894 codes the stop codon tag
894-end is the 3UTR (3 untranslated region)
2091-end is the polyA tail
ORIGIN

1
61
121
181
241
301
361
421
481
541
601
661
721
781
841
901
961
1021
1081
1141
1201
1261
1321
1381
1441
1501
1561
1621
1681
1741
1801
1861
1921
1981
2041
2101


gccggcgccg
aaggacagcg
gagttctttg
gaggaggtga
aaggaggagc
aagttaaaga
gacctgagga
tcggagtaca
cagctggaga
gggaaccccg
ctgagcgaga
ctacacgaca
aggatcgagt
aagaaggccg
tgtgtgatcc
cccaaactgc
ggctgggctg
tgctcccttc
gagtgtgcgt
gcacagcgag
cttcagtaac
gctgacctgc
agctggccac
gctctgagtc
ccctcccctg
tcctccggcc
gggctggctg
gtcagtccct
gggaagtcag
atgtgccacg
tacctcctcc
cctctccctt
gtgtccgtgt
cgtgttgcct
cagctgccca
aaaaaaaaaa

ctgccactcc
atgatgatga
agcaggtgga
agcggaagca
tggaagaact
gcatcgagca
tccggaagac
acgccacgca
tcaccggcag
ccatctttgc
ttgagacgcg
tgttcatgga
acaatgtgga
tcaagtacca
tgggcatcgt
cactccactc
ccctcccaac
tctgccatgg
ctgtacggga
gagcagaccc
tcggtgggcc
cctgtcctct
atggtgctgc
tcagtcgctg
cctaggggca
ccaggagcaa
gtgccctatt
caagccagcg
atgtcatttc
tgggtgtcac
tcctcaccac
gtggacaggc
gccccaccct
tcttgaacag
ttttgtgaaa
aaaaaaaaaa

cgggagcatg
tgatgtcgct
ggagattcga
cagtgccatc
catgtccgac
gtccatcgag
acagcactcc
gtccgactac
gaccacgacc
ctctgggatc
gcacagtgag
catggccatg
acacgcggta
gagcaaggcg
catcgcctcc
caggtgggcc
ccccgcctct
gccctccgtc
agaggcagag
aggcagggcc
caggttctgc
ccagctgtcc
ttttcaggtt
atcactgcca
aagtccatcg
ccccttgggc
tccagccacc
ttgcatgttt
aggcctgcag
gtgtcccaga
cttggggctt
agggagatgc
ccctcggctt
cgattccccc
tttttatgta
aaaaaaaaaa

aaggaccgaa
gtcaccgtgg
ggcttcattg
ctggcatccc
ataaagaaga
caagaggaag
acgctgtcca
cgcgagcgct
agtgaggagc
atcatggact
atcatcaagc
ctcgtggaga
gactatgtgg
cgccggaaga
actgttgggg
actccaagga
ggctcagagc
cccgccccgt
ggaggcagcc
gccagggtga
tcttccctgg
ccacaagcag
aggggagagg
gggaggctca
ggtcctgggc
taggtctgac
ccagcagcta
gggatggtgg
tctcatcctg
tgcagtattc
ctcatgggaa
atgcgagtgc
tactcctgcc
caaccccttc
gaataaacat
aaaaaaaaaa

cccaggagct
accgagaccg
acaagatcgc
ccaaccccga
cagcaaacaa
gcctgaaccg
gaaagtttgt
gcaaaggccg
tggaggacat
ccagcatctc
tggagaacag
gccagggaga
agagggccgt
aaatcatgat
gcatcttcgc
ggccctggct
accctccctc
gtcgtgtgca
agcggggcgt
cacaggccac
ggaccctaac
agccctgagg
tggccctgag
ggctgccatg
ctcagcttcc
cccaggtgtc
gggaggcaaa
ctcctgttgt
cccttgccat
ggcagccagc
atgtgccccc
atgcagcagg
cagtgactgt
accaaaggtc
ttgtatctgt

ccgcacggcc
cttcatggat
agagaacgtg
tgagaagacg
agttcgttcc
ctcctccgct
ggaggtcatg
catccagagg
gctggagagt
gaagcaggct
catccgtgag
gatgattgac
gtctgacacc
catcatctgc
ctagaagcca
gctgccacct
ccggccccca
tgatctctgt
gatgcagtgt
ccttccttgc
ctcccctcca
ggtggggacc
ggacagccca
gctccaggct
cttcccacat
cctctggaag
gcaggctgca
cttgcgctct
cctcccatcg
cggggagggc
gccccaggac
ggatggggcc
gaccactgtc
ttggtacaac
aaaaaaaaaa

Scroll back up the features section to CDS, and /protein_id=NP_004594.1. Click on the hyperlink: NP_004594.1
to link to the protein translation of the human STX1A coding sequence.

1- GENE TUTORIAL

Now, back at the Entrez Nucleotide entry for the full length mRNA, scroll down past the introductory header and
the mRNA features to the end where the complete nucleotide sequence is provided under ORIGIN. We know that
the coding region is at position: 28 bp to 894 bp, so it is easily to categorize all the major features of the mRNA
sequence :

B. EXPLORING ENTREZ PROTEIN

translated stx1A protein


product is 288 amino acids
in length

protein product is 32 kDa


in molecular weight

amino acid sequence of


human stx1A is shown

Under CDS click on /db_xref=GeneID:6804


This opens up the Entrez Gene webpage for GeneID: 6804 which we were at previously.

1- GENE TUTORIAL

Hyperlinking to NP_004594.1 takes us to an Entrez Protein website for human STX1A translated product. All the
features provided at this webpage relate to the protein sequence. The ORIGIN entry is now an amino acid instead
of a nucleotide sequence.

D. EXPLORING GENOMIC DNA SEQUENCES IN ENTREZ NUCLEOTIDE

Arrow lengths in the genomic map represent their relative length in bp of the genes in genomic sequence. The
arrow directions represent the direction that the gene will be transcribed. mRNA strands are single-stranded, and
are transcribed from the top (+) or bottom (-) strands of double-stranded, genomic DNA. Whether the mRNA transcript will be read from the + or - strands will depend on the recognition of DNA binding elements in the genomic
DNA by transcriptional factors. The rate at which transcripts will be transcribed in particular cells will depend
upon the expression of transcription factors that bind to transcriptional elements in the genomic DNA. Expression of transcription factors may vary dramatically in different cell types, leading to specific genes differentially
expressed in specific cell types (such as neurons, kidney or heart).
mRNA is transcribed from a parallel, complementary and reversed genomic DNA strand. According to the genomic map, STX1A, DNAJC30, ABHD11 and CLDN3 are on the - strand of the double-stranded, genomic DNA as
template. WBSCR22 is coded from the + DNA strand.
To get a closer look at the position surrounding STX1A on chromosome 7, click on See STX1A in MapViewer

Switch between
chromsomes

change region
displayed by bp
zoom in / out of
the chromosome

10

1- GENE TUTORIAL

In Entrez Gene website ID: 6804 for human stx1A, scroll down to Genomic context. Illustrated is a genomic
map of a DNA fragment of chromosome 7, Location: 7q11.23, where syntaxin1A (STX1A) is mapped relative to
its neighboring genes, WBSCR22, ABHD11, CLDN3, DNAJC30. Chromosome 7 is one of the 23 pairs of chromosomes in humans. Chromosome 7 spans more than 159 million base pairs and contains between 1,150
genes. The genomic map illustrated below is close to midway into chromosome 7 at location: 72,733,184 bp to
72,822,512 bp. The mapped region illustrated represents 89,328 bp, which is only 0.056% of the total region of
chromosome 7.

In the zoom panel, click zoom out to show 1/100 of chromosome. Here you can see between 71,970,000 bp 73,560,000 bp and see the arrangment of genes upstream and downstream of stx1A on chromosome 7. Scroll to
the right to the last map (map 6), for Genes_seq or Genes on Sequence.

Go back to original closeup display of stx1A in MapViewer. Scroll to the right to the last map (map 6), for Genes_
seq or Genes on Sequence, where STX1A and WBSCR22 genes are illustrated.
stx1A is contained between region: 72771897 and 72752673 and is 20,423 bp in length. You can see light blue
filled boxes, which represent the ten coding exons of stx1A. The last outlined blue box is non-coding, 3 untranslated region of stx1A. The intervening lines are all introns. THe mRNA for human stx1A is 2,117 bp or ~10% of the
20,449 bp of genomic region. In other words, ~90% of the stx1A gene are introns or untranscribed or untranslated regions which may carrying alternative spliced exons, promoter and transcriptional binding elements. A ten
fold larger intron to coding sequence ratio is common for human genes.

11

1- GENE TUTORIAL

The MapViewer illustrates a close up of 72,748,900 bp to 72,774,500 bp on six chromosomal maps, including
mouse (Map 1, left) and the reference human map (Map 6, right). We can change the view in MapViewer, choosing different chromosomes to view. We can also zoom in and out of a chromosome, or entering a customized
region of a chromosome that you want displayed.

last exon with CDS (filled


box)
consensus
CDS database
hyperlink

direction of
gene

largest intron
(thin line)
(10,586 bp)

first exon with CDS at


72,772K (filled box)

Click on STX1A, to get back to the GeneID: 6804 webpage. Scroll down to Genomic regions, transcripts, and
products. NC_000007.12 is the accession number for the reference genomic sequence of human stx1A. The red
blocks (coding region) are tiny in comparison with the red line (non-coding regions). The red blocks are exons.
The blue block represents the 3 untranslated region.

12

1- GENE TUTORIAL

3 UTR of gene at 72,751.5K


(clear box)

Click on the title: NC_000007.12 and choose NUCLEOTIDE LINKS: GENBANK (not FASTA or GRAPHICS):

1- GENE TUTORIAL
The webpage that opens is the Entrez Nucleotide entry for genomic region Accession: NC_000007, which encompasses 20,449 bp of the genomic sequence for stx1A.

13

The mRNA of stx1A consist of joined fragments of 10 exons, separated by non-coding introns through the 20499
bp coding region.
join (1..57,10537..10614,10971..11070,14435..14509, 15236..15309,15403..15511,15831..15904,16677..16814,
18819..18929,19175..20449)
The CDS or coding sequence also consist of joined fragments of 10 exons, but does not include some of the first
and last exon, which contain 5 and 3 untranslated regions:
join(28..57,10537..10614,10971..11070,14435..14509,15236..15309,15403..15511,15831..15904,16677..16814,
18819..18929,19175..19252)

predicted 20,449 bp
genomic sequence
(derived from mRNA and
computation)
2,117 bp mRNA formed
by joining 10 exon
sequence from genomic
sequence

link to Entrez Gene webpage

link to Entrez Nucleotide


webpage for the mRNA
sequence

867 bp coding sequence


formed by joining 10
exon sequence from
genomic sequence

link to Entrez Protein


webpage for the amino
acid sequence

14

1- GENE TUTORIAL

Under features for Accession: NC_000007, the mRNA and coding regions are described with reference to their
location in the 20,449 bp of the genomic sequence for stx1A. The genomic sequence was found by automated
prediction methods and comparing the sequence of the mRNA with regions in chromosome 7.

Exon
#

2
3
4
5
6
7
8
9
10

exon sequence
location

location of
donor splice
site

5 donor
splice site
location of
non-coding, in(consensus
acceptor splice
tron sequence
is GTRAGT)
site
*R=A+G

3 acceptor
splice site
(consensus
is YYNCAG)
*Y=C+T

1..27
(5 UTR)
28..30
(ATG start codon)
28..57

58..63

gtgagt

58..10536

10531..10536

gcacag

10537..10614

10615..10620

gtggga

10615..10970

10965..10970

ccgcag

10971..11070

11071..11076

gtgagt

11071..14434

14429..14434

ccccag

14435..14509

14510..14515

gtgagt

14510..15235

15230..15235

ccccag

15236..15309

15310..15315

gtgcgg

15310..15402

15397..15402

ccccag

15403..15511

15512..15517

gtgagc

15512..15830

15825..15830

ttgcag

15831..15904

15905..15910

gtgagt

15905..16676

16671..16676

ccccag

16677..16814

16815..16820

gtgagt

16815..18818

18813..18818

ccgcag

18819..18929

18930..18935

gtcagt

18930..19174

18930..19174

atgcag

19175..20252
20250..2052
(TAG stop codon)
19253..20449
(3UTR)

Below is the complete 20,449 genomic sequence for stx1A with the highlighted DNA sequences in color indicating different features, such as exon, splice sites, 5 UTR, 3UTR, start codon and stop codon.

15

1- GENE TUTORIAL

Under ORIGIN is the Genomic sequence for human stx1A. A table below collates the locations of the 10 exons,
9 introns and the flanking sequences surrounding the intron splite sites. Notice that the first exon contains both
5 UTR (a 5 untranslated region), followed by coding sequence starting with the start codon. The tenth exon
includes the end of the coding sequence, the stop codon, followed by 3 untranslated regions. Introns are spliced
out in the transcribed heteronuclear (hn)RNA usually by a complex of small nuclear ribonucleoproteins (snRNPs)
of a spliceosome. Almost all intron splicing involves a canonical GT-AG splice site consisting of a 5 donor splice
site, a 3 acceptor splice site and a branch site within the intron. The 5 donor and 3 acceptor sites are six bp in
length and flank either ends of the intron. The 5 donor spice site consensus is AG/GTRAGT (where R = A or G) and
the 3 acceptor splice site consensus is YYNCAG/G (where Y = C or T).

ORIGIN

gccggcgccg
agtccggccc
tcccgcccgg
gggagcgggg
ggggtcaggc
gtcctggata
tttggtggcg
gcccctcatc
gaacctcctt
ggggccccgc
ggccaccctg
agggatccac
ctatccacct
ttgcgacttg
ggcggggttt
ccagcccaga
atgaggagcc
ctgggtgtgt
tcatctggtc
ccccccaccc
gtcaagtgtc
agttggagcc
cctgggctcc
tccagacctg
aaggaagaag
gtggttgccc
gatggctgtg
tgtgggaacc
tgtcagcctt
ctgctagagg
aggcctggct
gatctttttg
aagccaggct
cacaaggcat
ggctttctag
cagtcagtct
ccatctgtga
ctacaggtct
attttcagcc
ggactgggct
acagcggctg
ttctttctcc
tttataaaag
cagccctcca
tgaagggggt
cagaaaggcc
cacagccagg
tgcctcctcc
acctttctcc
acatgggacc
atcctggaag
tgaataggag
ggaatgctca
gaagtgcctt
tgggaaggcc
gagtctggcg

ctgccactcc
cagcgcgcgg
ctcccgcggc
cgagccaggg
tcgcgtgagc
aacttttcgg
ggacctcttc
tctgacctcc
ggtgtagaca
tgcggggcgc
gtcacctgtg
tgcccgcccc
tcgggctttc
gatctatttt
ggggggagta
gacccctgca
caggaggatg
cggggagggg
ccctggctgg
cgctccctgg
ctttgggcat
tgtctgcccc
cctgccttca
ttattcctgg
atggtccctc
tcaagtttcc
tggtgtaatg
aagtggtcag
catcccccta
gaggggtggt
gacactgggg
ctctgatgtc
ttgctttgaa
tcagagccca
cagacaggtg
tggtggggcc
agtggggatg
cctgactcca
cagaatcccc
ggactggggc
tgcaccgctc
ttctccccca
tcacacttgg
gccttctcca
cactggggac
acttgcaggt
gaaagacttt
cctcaactca
ctgggcccag
tgttgcagga
gcttcctgga
gagagaaggt
accaggagag
gaggtctaga
cagggccctg
gttgggtggc

cgggagcatg
ccgcccgccg
caggatggtg
acctcgaccc
tcggccgcgc
ggccctcttg
cccctcccgc
cgtgccccct
gacgagggga
gctgcggaaa
gctcccatca
aggtatgggg
gggcagcact
tctacctgta
ggaggatgca
gcctgcgtgg
gtgaccggga
gggtggatca
ccacctggct
gctaggaact
ttgaccccac
attatgtcat
tgtcccccac
ccgagcgccc
aagtcgcata
tgagctccgt
tgctgatgtt
gggaaccatg
gctgaagggg
agaaaagggc
gacagccagt
tgtctctgtt
tggtctgagg
gggaggctgt
gacggctggc
actgttccca
ggaaagccta
gctccacatt
agaaaggaag
gggctgggct
ccagccctcc
cccaacacca
ggaaggagct
gccttctcca
tagccagggg
tacaggtaac
taggttagag
gagactccct
aacatgtatt
ggctgaagac
agaagagacc
gatccaaagg
aatagcaggt
ctagggccca
ctttgcccat
aacaccgggc

16

aaggaccgaa
cccgccgcca
gccgctcctg
ccctccgggt
tgacacgtgg
atttgggggg
agagcaggcc
cggtgtagac
ccgcggtggg
cctgatcgtc
cctgtggctc
gggggggcgt
ggctggcgtt
tctgggaaac
caaacgttct
gtccccttcc
gtggaggaca
aggctcaaga
ggcagccctg
agaaccctaa
ggcttgtctt
ggttgcctcc
ttgctctggc
agcccctggt
tgggctctaa
tgagtcagaa
taggcgcctg
ggcccctaca
aaactgaggc
tttgcctcat
ggagctgggg
actacctcct
cttccgctcc
acccattgct
aggcaggagc
caggctgctt
gaattcagtt
tgtaggaggc
aggccagcac
gggctgggct
ctccctccct
ccacatttgg
ggcgcgtttc
gctcccaact
gcctcaatgc
aaagagagac
catccccagg
gattttcttc
tgtctatgtg
gtagcaagga
ctgggatagc
aggggacagc
ggtcctgaca
actcactgga
cagtttcccc
aggttcatcc

cccaggagct
gcgccaacgc
cgcccccagc
tccgcccctc
acggtcggct
ttcggagggt
ctttctctgg
agacgagggg
ccggtgccgg
gggtggtggc
cccctctgct
cacacccgcc
tccagaccct
ccctggggtg
gagtcagagc
cctggcctct
gcgacggaaa
ctagattgag
cctgctgatc
ggtcccctcc
ctccacccag
tgggcatcct
accacggcct
gtaaacccca
acacttcccc
tgagtgtttt
gtaggttacc
taggataata
acaaaagaaa
ccctgaagat
ttgtgaagcg
cgccccctga
agacaccccc
gggtggccct
ccacccgggg
tccctctgga
tctcgggacg
ctgaagcccc
tggctgagct
gaacagagat
cttcctcctc
aaggaggcga
ctcactctgc
ctccctctgc
tgtcgcagtc
cgaagacagg
agccccactc
ctggggactt
ccaggtcatg
gggagagttg
acaggagcag
ttagtgaagg
gctggtgtct
gcagcccagg
ttctctttcg
ttaatctccc

ccgcacggtg
gccggacact
cagggctgaa
ggacccactc
ggagtccggg
atgatggggt
ccccctcact
accgcggtgg
tcccctgcgc
ggcctcctcg
tggtcgcagt
cccggtccac
tcagggttcc
tggctcctgg
ccaccttctc
gcccttaggg
tgaagcctct
tctcatctgg
cctgggagca
tatggaaaat
aggctccctg
ggagccggga
ccactttgcc
tgaggtttca
cgcggcccct
cctggtgacg
tgagggtgtg
gcctctggca
cacggcttgc
agtcagactc
tttgaaccca
ttagatggtg
accctcccca
gcccagggag
tggggtgaga
cctagggttc
agaagcaggg
aggagtcaga
ggactgggct
ggtgctgggg
ctcctccctc
agagagctta
aggcccggcc
agatggggag
cctcagagat
gccagggtca
aggctgtagc
tgcagagctg
tgggacagac
gagcagcaca
gagctggatt
cagagagatt
ggggccccct
caggcctgga
ttgctgggaa
tctgctcatt

1- GENE TUTORIAL

1
61
121
181
241
301
361
421
481
541
601
661
721
781
841
901
961
1021
1081
1141
1201
1261
1321
1381
1441
1501
1561
1621
1681
1741
1801
1861
1921
1981
2041
2101
2161
2221
2281
2341
2401
2461
2521
2581
2641
2701
2761
2821
2881
2941
3001
3061
3121
3181
3241
3301

tcctcatctg
taagaacagg
cccagcatga
cgggaccatc
ggtgtttatc
agctgcctcc
ctggtgtaca
ctttcctgga
cactagtgat
aggcagacaa
cccctgggac
acatccagga
cccacctctg
tcttcctagc
ggaggtggtg
ccaggctgga
gtgattctcc
ggctaatttt
atctcttgac
agccaccaca
ggctggagtg
attctcctgc
taatttttgt
tgctaacctc
ccactgcacc
agccgttttt
agaactgctg
cctgcagtac
agaccagcct
tgtggtggca
agcctgggag
acaaagcgag
cctagcactt
ctggccaaca
cgggtgcctg
aggcggaggt
agactctgcc
gagaagagga
gttcagagga
aggtgcccgt
gccccacagt
agccccctct
tgggccctaa
ttgctctgtc
tccagggctc
ggcaccacgc
ctggctggtc
gagattaccg
cagcccaaga
gtcaagagtg
tcccgcgtcc
tctggaggac
aggggcgaca
tgaagaaatg
tggttgggaa
tcctggctgt
tggcattgcc
agaggcacag

cagaatggag
agccggacag
acgatgcatt
tagttcaacc
tgaggtcatg
ctgagccact
cacagctgga
gacccagctt
attgactttg
cagccactgt
acgcctggag
aggtgtcctg
ccaggggaag
agtctccatg
ttttttgttg
gtgcaatggt
ttcctcagcc
tgtattttta
ctcgtgatct
cccagccaat
cagtggcata
ctcatcctac
atttttagta
aggtgatcca
cggctacccc
gttttttttt
aatgcagaac
caactacttg
gagcaacata
ggcacctgta
atcaaggctg
attctgtttc
tgggagaccg
tggtgaaacc
taatcccagc
tgtggtgagc
tcaaaaaaaa
ggaagatggc
tggggctctg
atgatatctg
tccagctgcc
ccagagctgt
ttattccaca
acctaagctg
aagcaatcct
ccagctaatt
tcaaactcct
gcgtgagcca
caccaggaag
tgtctgggct
ttatcagttt
atctctagaa
gagccaggag
aaacaggaag
gggcttgggg
attactagag
cagaccacag
tggaacaact

ccagtgttac
ggctcgcccg
tgggtacatt
tcctatctta
cagcctgtgg
ctggggccct
ggctggagac
ggcctctcgc
attagtcatt
gctcagcagt
gctgcctggc
gaatcctgga
ctcagggcat
ctagggaggt
gttttttttt
gccatctcag
tcctgagtag
gtagcaacgg
gcccgcctcg
tttttttttt
atcttggctc
cgagtagctg
gagatggggt
cccaccttgg
tataaattct
tttaaccccc
aaaaaaacac
ggaggctgag
gtgagaccac
gtcccagcta
cagtgagcta
aaaaaaaaaa
aggcgggtgg
ccgtctctac
tactcgggag
cgagattgcg
aaaaaaaagt
agatgtccta
gcaaaacagg
ggcacatccg
acctgcacag
gaatgccact
gaaatctgtg
gagtgcaatg
cccacctcag
tttgtagctt
ggactcaagt
tcatgcttgg
acagcatgcc
gggaaacaga
attccgtgtc
tgctgccacc
ccatcagaat
aaggtccttc
cttttcttga
gtagatcata
tcagcagcag
cttgggcagg

17

ctactttgca
atccttcctc
ttcatctggg
cagatcagaa
gggcagagcc
gtgaaggggc
agggtgagga
acacccctga
taaaagaaga
gatgaatcaa
ctctgcgcct
gtctctgcta
ccagctcccc
aaggagcact
tgttttttta
ctcactgcag
ctgagactac
ggtttcacca
gcctcccaaa
tttttagacg
actgcaacct
ggattacagg
tttgccatgt
cctcccgaag
gattcagtct
ccaaaggatt
caggatgtta
gcgggaagac
catctctaca
ctctggaggc
tgatcttgcc
gagctgggcg
atcacctgag
taaaaataca
gctgaggcag
ccactgcgct
gccaggacaa
ggtttctggc
tggagcctgg
tccgtaagac
cgaagggttc
cagaaagcat
tgtccacttt
gcacgatcat
cctcctgagt
ttgatagaga
aatcctccca
ccactttatg
agggtagcca
ggaggtgagg
caatgagaac
aggctctagc
cccacaaact
actcgaaccc
caggaaatgc
tagaaagagg
attcagtttg
ggctgtaatt

gagtggcagg
ccattacgtt
gtgcaggtgg
aacaggtcca
tggccccttc
aggctctgca
ggctcagatt
ctcagctccg
agggaaaaaa
ccttgccagc
cttggaggag
ggagagacca
gtggttccag
ccctcccagc
gacggagtct
cctccacctc
aggtgtgcat
tgttgggcag
gtgctgggat
gagtctcatt
ccacctccca
catgcaccat
tgaccaggct
tgctgggatt
ctttggggtg
ctggtgctcg
gctgggcttg
tgcttgagct
caaatttaaa
tgaggctgga
actgcactcc
cggtggctta
gtcaggaatt
aaaactagcc
gagaatcgct
ccagcctggg
attggatgtt
ttggggttct
ttttggccag
acttgggcct
caagtcccca
tggtagatcc
tgtttttttg
agctcactgc
agctgcaatt
tggggtcttg
ccttggcctc
tatacagtgc
catcacctag
gtggcagtgg
tgattcctta
ttggatgtgg
gaaataagca
caaatccact
tacatgatca
gaggggatgg
tgtgctgtat
tggggtccct

cacgagagtg
ccagaggttt
aacggcctcc
gaaagggcca
cacggcctga
gggcggggac
gcacagccgt
ggcctctctg
atagccctcg
ctggtggctt
tcagactatt
ccagggctgc
cagggtctgt
cttcctggtt
cgcactgtct
ctgggttcaa
caccatgccc
gatggcctcg
tacaggtgta
ctgtcatcca
ggttcaagtg
cacacccggc
gttcttgaac
aaggcatgag
gatccctgga
cccagctttg
ggggctcaca
caggagtttg
attagccagg
ggattgcttg
agtctggaca
cgcctgtaat
caagaccagc
gggtgtggtg
tgaacctggg
caacagagcg
ggttggatgg
gcgctaggca
gaggcaggca
ttgagtttgt
actccaatcc
ccaagggagc
agacaggatc
agcctcgacc
acagatgcat
ctatgttgcc
ccaaagtgct
ccgccctggc
ttacattttg
ggctctgcca
tctgagtgaa
atgagggtgc
tcagaacaaa
ccacagacag
cttatctgat
gccagatctt
gacaggagac
gggaaggatc

1- GENE TUTORIAL

3361
3421
3481
3541
3601
3661
3721
3781
3841
3901
3961
4021
4081
4141
4201
4261
4321
4381
4441
4501
4561
4621
4681
4741
4801
4861
4921
4981
5041
5101
5161
5221
5281
5341
5401
5461
5521
5581
5641
5701
5761
5821
5881
5941
6001
6061
6121
6181
6241
6301
6361
6421
6481
6541
6601
6661
6721
6781

tgaaagggaa
tctatagcag
accacaggga
gggaagctag
agggggactt
gtgaatggcg
tggaaggagg
caaagggatg
ttgtaggtgt
ctgggggtga
tcgcccatcc
caggccctca
ttgacctgcg
tagacggagg
gcttcctggt
tggagaagat
aggtgccagg
ctctggccac
cctctgagca
tcccaacctg
cagtttccct
tgaaaacggc
ctggtgacag
ggaggtggta
taactacccc
acatgctcag
aaggtgggaa
cctgtcgcta
gatgcatctg
aggttgaggc
agaccctgtc
cttccccact
agaagagtaa
cagcccctca
ctgagctagt
atgtggtggt
gaggtcagga
caaaaattag
caggagaatt
actccagcct
ggtgggaaca
ccccggatat
accctggcca
tcagaagtat
acagtggagt
aacatgactg
gaggtccaat
gtgctttgga
gtaacacagt
aaaaagactc
agtggtgccc
agggagcttg
tggaaaccta
tgtttaaggc
agacctcaac
aggagagtag
cccagacaga
cagataggtg

ggaatactta
gggtacccca
agagaaggac
tgagctcact
ttcaggtccc
tgtggaagaa
tggcatcgag
tgcctggggg
gggggagtgg
aaattcccaa
cctctaggtt
gccccccgac
tgcccccatc
gtccagagga
ccagccttca
ctgccctcat
ggtctctgag
ggaaaaggag
ccagggcaga
gcagcctgcc
agggagggct
tcctggattt
ccccaggcac
cgatatctta
ctctccccat
gccgggcatg
gatcacttga
caaaaaaaaa
tagtcccagc
tacagggagt
tcaatcaatc
ttttctcagc
atagagccag
acctccacct
cagtggccac
tcacgcctgt
gttcgagacc
ctgggcatgg
gcttgaaccc
aggtgacagt
agccctgagc
ccttgccttg
actggcggta
gattacccct
ggggcttgta
tccccttgtg
ggcaagaccg
aggctgaggc
gagacccccg
acaggagatt
agggctggga
ggacctctgg
taattccagc
tgcagtgagc
tctacagatg
agaatgcact
gcggccggcc
agggaggaag

aaaagaaaaa
gggagagggc
tctaacagtt
gtcacaggag
ttagggcctg
caaggtgcga
gaagaaggga
aagttggggg
ggtctgatcc
gaacgaagaa
ttgcttggag
tggctttggg
taccctggcc
ggatgcgggc
gccttctggg
tacacagatg
ggcaggacag
ctggataatt
gagggcagga
caggaggagg
cctctgactc
tctgtgtctg
tcggtgggga
aagagaggag
gagtccccac
gtggcggctc
gcccaagagt
aaaaaaatca
tgcttgggag
tatgattgtg
aatcagtcca
cctccctcca
aggggttaag
gcctccgtcc
tcccgtgagg
aatcccagta
ggcctggcca
tggcatgcac
aggaggtgga
gcgagactct
cccaggctga
gctgcagaga
tgaggcagac
gccccacgat
actgtggctg
cctgctgggc
acagggaggc
aggaggatcg
tctctacaaa
cctaagagat
cacagtaggc
gctccacctg
tactcgggat
tgtgattgca
tgtgtatgtg
ttccaggcag
ccaggctcgc
tccctggccc

18

tctagggagg
acccacaggt
ggggctgccc
gtttgcaagc
agagttttca
gatactgatt
atgtcacctt
cagagggtgg
caggggaggg
gtagaaggac
ctgtggctgg
gccaatctca
ctgatctccc
atctctcctc
aaggtagcag
ggcaagctga
gggttcggtg
ctccatccac
cactgaatgg
aactgggctc
aactcctagg
tagggagctt
gtgggaacga
gcccctctga
tttgatgggg
acacctatac
ttgagaccag
aaaagtagct
gctgaagtga
ccactgcact
tccatcaatc
tgcagccagg
gcccagcctg
tgacgcgaga
cactcagaaa
ctttgggagg
acatagtgaa
ctgtaatccc
ggttgcagtg
ctcaaaaaat
gagacctcag
gagacctggc
agtaataatt
gggggtggct
aggcctaagg
ccaggtctgg
caagtgcagt
cttgaggcca
aataaaaatt
ctgatccgtg
aggtctcccc
aattaaaacc
gctgagttgg
cccctgcact
tttgtgtgtg
ctgtcctagg
ttgcctcccc
agaggctgag

atgtggatgc
tctttatggc
gagatgggca
tgaggctggc
gccatggctc
tgggggggca
catggaatca
attgtagggc
cagcaggata
gtggggagga
tttgaccatg
atgccacctc
tttccccagc
ccagcccttc
gtcagagctg
ggtcaagaga
gggctggcag
cctttgtcat
caggagctcc
ctcctggcag
cctgtcacgg
ggtgggggag
ggatttattt
gccgcctcta
tgatgtgagc
tcccagcact
actgggcaac
gggtgtggtg
aaggatggct
ccagcctggg
aaacaaacat
tgtcactgtc
atgagtccca
gcaccgaggt
tgaggtggga
ccgaggcggg
acctgtctct
agctactcgg
agccaagatc
aaaaaaataa
tccctgcccc
cttgccagca
gtgcgcagca
tttgtacagc
caggacccaa
ggactggaag
ggctcacacc
ggagttcaag
aaatttaaaa
ccacctcggg
ttatagggcc
actggagcca
gaggatcgct
ccagcctggg
tgtgtgtaaa
ggcctagggg
agccacaggg
acttggcctt

cgccttcaaa
ctggaaccaa
gagctgtctg
agctgaagcc
caataggaat
gggaggcttc
gcatcccggg
caggaaccag
aagagcagag
aaggcccact
cctcagaggt
cagggtagac
cctaccctca
tgttagagaa
gccggatgcc
ggcaccagac
tgacgctcgt
ctaccctgat
tggggccact
tgctctgcct
tctaaaagcg
ggggtaccac
tagctgctgg
tcaaagctcc
ccctcacaga
ttgggaagcc
atagtttgac
gcatgcatct
tgagcccggg
tgacagagcg
acctgccctg
cccatttcac
tcagttctct
ggggtcagtg
acaagatcag
tggatcacct
actaaaaata
gaggctgagg
acaccactgc
aaataaaaga
agggtggcca
gcgtgggctg
tgctgggaag
tggtcctggg
gccaagtggc
gaggcagacc
tacaatccca
accagcctgg
actgttttaa
actggggttt
cactctggaa
ggcatggttg
tgagcccagg
tgacacagca
ccacagggcc
acccagaggg
agggggcagg
gacaccagtg

1- GENE TUTORIAL

6841
6901
6961
7021
7081
7141
7201
7261
7321
7381
7441
7501
7561
7621
7681
7741
7801
7861
7921
7981
8041
8101
8161
8221
8281
8341
8401
8461
8521
8581
8641
8701
8761
8821
8881
8941
9001
9061
9121
9181
9241
9301
9361
9421
9481
9541
9601
9661
9721
9781
9841
9901
9961
10021
10081
10141
10201
10261

tatcaggcaa
gctgggagcg
ggaggggcag
acatacgtcc
gatgtcgctg
gccagcacac
agccaggttc
atgaaccgtt
ctcctcccca
gcccgtcctc
aggcttctcc
ttcgaggctt
ccatcctggc
ggccccacaa
ttcatggctg
agccttggga
ggccggccag
ccttcccagg
tggagtcttg
tctgccttcc
gcacgcgcca
ttggccaaga
gctgggatta
ttcttttttc
cagtggtgcc
gtcagcctcc
tatttttagt
caggtgatcc
cccagcccca
ctggctgcag
tgtgatcaat
cagatggcag
ttccggaagg
cagggctcca
ctctgtcccc
ctacactgag
tgagccccag
cccggggtca
ccaagcccac
atgaacccag
ggtcaggtgg
gggctcacca
gtctgtgcgc
gggtcaggag
gtctctcact
tgggctcctg
ggaccggttc
tctcgctaag
ggagaagtct
gcctcctctg
gacgtcagag
cgcgcctcgg
ccatccgggg
accccagccc
cggcttctct
ccccaaggga
caagctgggc
ggccacaaaa

ggttcctctc
gaacagccag
ggccctggcg
acttatgtgt
tcaccgtgga
cccacccttc
agtgctcttt
ttgagggtga
tctgggtctg
cccctgagcc
cctgatttcc
cattgacaag
atcccccaac
gcctcgggaa
gtgggcttag
aatgcagctg
accaggtctc
tgaactggcc
ctctgtcgcc
gggttcaagc
ccatgcccgg
cagtcttgat
caggcatgag
tttttttttt
atctcggctc
caagtagctg
aaagatgggg
gcccgcctcg
agaaggtatt
ccctgggctt
aggaaggcct
ctctaagtcg
agctggcagc
ggaagtcacc
agacaggtcc
agatgccagg
ggatgcagcc
gattttcttc
tggcatcttc
gtcagtgcca
gagcagggtg
cgctgagctc
ccagaggtat
gtggctgatt
gctggcaggt
caggccacct
tgagtcacca
aggagggggc
caaggaaaga
ggggctgcca
cagctggtaa
aaccccccct
ccttcctgat
caggggcaag
cggggcagag
ggcagctgtc
acccccatgg
cactatgtca

tctgtgccgg
gcctgttctt
ctgtaccctg
gcacaacccc
ccgagaccgc
ccgcacactg
cacacctgca
atagatggaa
gaaggatgca
ctgaccgcag
ttctcctgac
atcgcagaga
cccgatgaga
ggggctgctg
gagtgtctgg
tccccggctt
catgaaagac
ttttttttct
aggctggagt
gattcccctg
ctaatttttt
ctcctgacct
ccaccgcgcc
tttcaagacg
actgcaacct
ggattacagg
tttctccata
gcctcccaaa
ttttttcatt
ctgggagcga
ctcggtccag
tcagcgctgt
cctggcacct
cctctccctc
ccctgacccg
ctgatggcat
atcaggtgtt
tcacctccag
aagaacccag
caccaactaa
gactgcctca
cttgcctgag
ctgggtcact
gtggggtggg
ggggcgggca
cctgcaccgt
cccgtgagct
tcatgttggg
tggcaatgag
tccggcctgg
ccgacttttc
tccccgctcc
agaccactct
cagaggggca
tcctgggggt
ttgtcccact
cctctgccag
ccaccctcgt

19

gcccccagtg
gtccctgccc
ggcacctgcg
gcacaggcca
ttcatggatg
gcagaacttg
cctggctcag
gctcagagcg
cccccactgc
ccctactctg
cacctggcct
acgtggagga
gtgagtgtgt
ccagggaggc
gcgggccagg
ggaactgggg
ttgggtgatg
ttttttcttt
gcagtggcac
cctcagcctc
gtattttagt
cgtgatccac
cagccgaact
gagtcttgct
ccgcctccca
catgcgccac
ttggtcaggc
gtgctgggat
tcgaactggc
gggaaggaag
ccccgcccct
gggagggcaa
ggccatgtgg
cacactggga
gtcctcatgt
aggtgccctt
tgagggaggc
cacggcctcc
agtaaatctt
gccagtgccc
ggaagggcag
agatttgttg
gtgtgtccct
gacggagggc
tggggcagca
ctcctgctgc
gagcagttga
ggttctaggg
tggggggctc
gctattttgg
ctttgttcct
acctcactgc
caacacccct
ggtgggaagg
gtcctgagac
tttccatcca
ctttagggag
ccaatttcac

gtcctccaga
agaggccccc
tgtgtacagg
aggacagcga
agttctttga
gggggccact
cagctccggg
gagctaaatc
accccagcac
ggccatctct
tggtccgcag
ggtgaagcgg
ggggcggggg
ctcacgggac
gtcgggaggg
ggtcagtacc
ggagaggcga
tttttttttt
aatctcagct
ccaagtagct
agagacgggg
ctgcttcagc
ggcttttttc
ctgtcaccca
ggttcaagca
cacgcccggc
tggtctcgaa
tacaggcatg
ttttttctaa
gagggaagga
cccctcgcct
gctggccaag
caccccgggg
gcctgtgtgt
tcctgttgag
ccctggaggg
aagcgctggc
tggtctatga
ttcacagccc
cgccctcaga
gagggagaga
gatggcagca
ctgtgggctg
tctaaagagc
gagctgtctg
ggtgtcatgc
gacctggcgg
cacaggcggc
tcccgggcaa
gtctgagcaa
acggatgagg
attccacttg
ccctgtcccc
tgatagccag
tggggtgttg
gcttctctca
atggttctag
cagggcccca

tcttactgtg
agggagaggg
tgggtcctgt
tgatgatgat
gcaggtggga
ctcgcctctg
gtgccaacga
tgggatcaca
ccggagcctg
gggccccaga
gtggaggaga
aagcacagtg
acgggtttca
ctctgcagcc
tttcacgaga
cagcatccca
tcggccccgc
ttttttgaga
cactgccacc
gggattacag
tttcaccatg
ctcccaaagt
cttttttctt
ggctgcagtg
attctcctac
taatttttgg
ctcccgacct
agcccccacg
ggcaactggg
ggaagctgtc
cccaggggag
caggcagagc
cgggcattct
caggctcgca
atgttgttta
gggctcacca
cacccaggtg
cccctcaccc
caaagtgtgc
aggcaaaatg
gggacgccca
tgagacggat
ttcctgctca
aggggagtgg
ctccacacct
acagtcatag
agtcccaggc
acagagttaa
ggcaggctct
aagaggctga
cgccctcctc
ggaggtgacc
tctaggacaa
agctcagtcc
ccaggagcac
ggcagggggg
ttggaacaag
gctacctgcc

1- GENE TUTORIAL

10321
10381
10441
10501
10561
10621
10681
10741
10801
10861
10921
10981
11041
11101
11161
11221
11281
11341
11401
11461
11521
11581
11641
11701
11761
11821
11881
11941
12001
12061
12121
12181
12241
12301
12361
12421
12481
12541
12601
12661
12721
12781
12841
12901
12961
13021
13081
13141
13201
13261
13321
13381
13441
13501
13561
13621
13681
13741

acggccaggg
aggccatcca
ctttgtggga
gtggggtggt
caccaggtga
ttcccggcgc
acccatctgt
aagacagcac
ggggggcggg
gcctcccact
aggaggccta
atgtccgaca
ttgtctgggg
ttgctcgccc
cccattcact
gggacacacc
tcacctctgc
cggtattctg
ccctgcagga
ttctgagcag
cacacgcaga
agctggaagg
acaggccaca
aggggacggg
gagcagtcca
aagacacagg
cccgcagggc
aagtttgtgg
aaaggccgca
cctggagcct
cttagttagg
cgggggcctg
gagctcgggg
ctgaactccc
cacgaccagt
tggggtgagt
atgaccccta
tagggaagga
ggcaaaggtg
ggcggccagc
ccatttctgt
tgttcgaggc
atgtggggga
tcggctctgc
caggtggaaa
ggtaacagct
gaagcctcag
ctctgccctg
tcatggactc
tcatcaagct
tcgtggagag
ccagggctgc
agttctcagg
ggtccctgtc
caccctctat
gggccacgca
gcttcctgga
ccagtggggc

ccaactgtgc
ccccttgccc
acagcaaggt
cccctgtcaa
gcccacaccg
caactccagc
tatctccagg
aagagggggt
caggcagagc
gggccccctg
cccccaccct
taaagaagac
gtgggtgggg
tgagggggtg
cagcagctac
cacctaaggc
taaagagggt
ggtgaacaga
ggggccacct
aaaaggagtg
gcttggagtg
gcttctgtag
cacataccct
gttgaagggg
tcgagcaaga
tgcggccacg
ctgacccctg
aggtcatgtc
tccagaggca
ggaggggcgg
ggcccgaggc
aggcggctgc
ctggggggag
cgaagagggc
gaggagctgg
gtaggcccca
gtgctgggtc
agagtcccag
gactcaagac
ggctcctcct
gttcacgtga
aggcttgggg
agggacggga
aaaatgctgg
gtgtcactag
agcatgtggc
gcccccttcc
cccctcagca
cagcatctcg
ggagaacagc
ccaggtgagt
actaacctcg
taaccaatca
ctctgcatgg
accccgccgt
tttccccaga
gtgaatggct
tcaaacttga

ccacctcttt
tcttccgccc
ctgccccacc
ccccctgcct
cccccccacc
atcgccagct
gggtgggggc
gaagctgcag
cttagccagg
ggctggaggc
ttctacctcc
agcaaacaaa
ggcattctag
cgccggatct
ttggcaaggc
tttggcgagg
tgaagaagtg
cacgtgcacc
ggctctcccc
gtacccaggc
ggtgtgcatt
gcccctcctc
aggggatgtg
cctggttatg
ggaaggcctg
gagtctagga
cacctccatc
ggagtacaac
gctggagatc
ggcccgagtc
gggtggcccg
tgtgttaagg
cccgggcctg
tgggctcagt
aggacatgct
tccctcctgc
tggacagtat
agaggggaaa
ccctgtctcc
cttatccgtg
agccgggcag
ctgtgagttc
gaaaggcagt
tgataaggac
ccccacgtta
agagccaggt
ggagcagctc
agcctctggg
aagcaggctc
atccgtgagc
gccgggcccc
ccctttctgc
cccgctgttg
ctctgcttga
ctctcattcc
gactgaaggc
gttgggcaaa
atgagtcacc

20

ggacctgagg
gccaggtact
cagagtgttt
ggctgcgtac
ccttcaactc
gctccacaat
gccagcaaag
ctggtgcagc
ggaacagggg
tgagcctgca
ccagagacga
gttcgttcca
gcctgcttcc
tgctttcctt
ccgttggtgc
gagggctcca
gccagtactt
ccatgagtgc
gctgtgggca
atagggtggc
tacacacatg
catcggtgcc
ggaaggtgga
acttagcagc
aaccgctcct
tgggcgggca
tcgggccccc
gccacgcagt
agtgagctgg
tgctgctgag
aggcagctgc
gccggggggg
ggacggaggg
gtgaccccac
ggagagtggg
catgaccccc
ttggacacag
acaccaggct
tgatgcctcc
ggcccttttc
gggtgctccc
caggagagag
tccaggcttg
tgtcactgag
caggtaggag
agcaaggcgg
tgcccacccc
tcccagcccc
tgagcgagat
tacacgacat
cctgccctcc
ccgacacgct
aaactgccgc
gcttcaagga
cctactactg
atagcagtga
agagttgcct
tggaggactt

gcccacatca
tagcacccag
gcgcatatga
cctcctgagc
cagcaagcat
tacttctgcc
gaggaaggga
cctggcccag
agaacaggga
catcagtccc
aggaggagct
agttaaagag
tctgcggagc
tctgagggtg
cagactctgg
ctctggatgg
cctggtctct
caaggcctcg
tggtccctcg
cggagctacc
cacacaccac
tggcactgat
tagatgctgg
cacctgtgcc
ccgctgacct
cctgagtctc
agcactccac
ccgactaccg
gggtggggcc
tgcgtggctc
tgagtttggg
cggggcctga
ggcggggccc
tcgcttgcag
aaccccgcca
tcaccatgtt
accccttgta
gaggtcacat
cagttggact
ctctcctgga
tgaaggatgt
gggaggagca
ggtccacggg
cactcgctct
attggcttgc
ggaaggctgc
aaacgccctc
ctgctcccac
tgagacgcgg
gttcatggac
actccagccc
gccctgccgg
cagggccctg
ccccaggccc
tttcctcatg
ccaggacagc
aagaacaaat
gtaagtttcc

gcgcccaccc
ggaggcgtag
ctcagggagt
caccaggtaa
caagtcactg
ttaaatttag
ggctcctccg
aattggggga
tcttcccaga
cgcggagcct
ggaagaactc
tgagtcaggc
ccatggggtt
gcgtcatgtg
gctgggccca
gcactgcccc
tgcagaagcc
gggccacatg
tacacagcct
agacctcaca
acagcgtcac
cccaccagcc
gtgtgcgggt
cccaggcatc
gaggatccgg
tgggtcggtc
gctgtccaga
cgagcgctgc
tggggcgggg
aaggctgctg
gctggagggg
ggctgccgct
aatgctgctg
ccggcaggac
tctttgcctc
ccctacgtcc
aggcaggttt
aaatgtcagc
cactttcgcc
ggacatttgg
ggcacacagg
gtcacattca
gcagtcaact
gtggccagca
agagaaatta
ggcttgggag
cccctcggcc
ccccagatca
cacagtgaga
atggccatgc
acaccagcac
ggagggacca
ccgacccgaa
tgcatggccc
gggccactgt
tcagggacct
cgaaggttct
aggtccccgt

1- GENE TUTORIAL

13801
13861
13921
13981
14041
14101
14161
14221
14281
14341
14401
14461
14521
14581
14641
14701
14761
14821
14881
14941
15001
15061
15121
15181
15241
15301
15361
15421
15481
15541
15601
15661
15721
15781
15841
15901
15961
16021
16081
16141
16201
16261
16321
16381
16441
16501
16561
16621
16681
16741
16801
16861
16921
16981
17041
17101
17161
17221

cccagagact
tggtgccacc
cggctgcctt
gatgccaaaa
tggccccagc
aaaaggaggt
cgaggtggac
gagggggttt
ggctggtctc
ggatcataga
tgtggccttg
agaaacagac
acacgcccta
gggaccagat
ttttcttttt
aggatttcag
tcctgagtag
gtagagatgg
aagactgaag
ggtgtgggca
cagcataccc
ttctctgcat
gctgctctgc
acccagggta
tctctagcct
gggggccctc
agtacaatgt
ccgtcaagta
cccagcaccc
ttctctcagg
cccccccgac
tgtctgaggc
tgtgatcctg
caaactgcca
ctgggctgcc
ctcccttctc
gtgtgcgtct
acagcgagga
tcagtaactc
tgacctgccc
ctggccacat
tctgagtctc
ctcccctgcc
ctccggcccc
gctggctggt
cagtccctca
gaagtcagat
gtgccacgtg
cctcctcctc
tctcccttgt
gtccgtgtgc
tgttgccttc
gctgcccatt

ggcagctgag
accactatgg
tgttggagtg
aagggggaaa
aacactaagg
ccttgctgct
ggggccttgg
gggttgtttg
ggaactcctg
tgtgagccac
gttaggaagt
catggtgacg
ggagtcttgt
cacagggctc
ctttttttga
ttcactgcaa
ctgagactac
ggtttcacca
accccagaca
ggctcacctg
actcctgctc
ggtccccaac
ggggaaatgt
cctggcccag
cagggtgctt
tggagctcgg
ggaacacgcg
ccagagcaag
attagggacc
agctccccat
tgtccagtcc
ctctcctgcc
ggcatcgtca
ctccactcca
ctcccaaccc
tgccatgggc
gtacgggaag
gcagacccag
ggtgggccca
tgtcctctcc
ggtgctgctt
agtcgctgat
taggggcaaa
aggagcaacc
gccctatttc
agccagcgtt
gtcatttcag
ggtgtcacgt
ctcaccacct
ggacaggcag
cccaccctcc
ttgaacagcg
ttgtgaaatt

gtgaaggtcc
ttccaaagtt
agtggtcaag
aggagccagc
tcccaagccc
ggggcacaag
caggcagtgt
ttttgtttta
ggctcaagtg
agcacccagc
acggaggccg
tggacgccag
ggggcagtgt
aaaagggctg
gacggagtct
cctctgcctc
aggtgctcgc
tgttggccag
gcagacagag
tggccacctc
ccaccttcct
cctcagctga
gggagctgcc
gtgagaagct
tcctgaagtc
gtgcccccgg
gtagactatg
gcgcgccggg
cccaaccatc
gacagccccc
tcagcctttg
cgcttcccat
tcgcctccac
ggtgggccac
ccgcctctgg
cctccgtccc
aggcagaggg
gcagggccgc
ggttctgctc
agctgtcccc
ttcaggttag
cactgccagg
gtccatcggg
ccttgggcta
cagccacccc
gcatgtttgg
gcctgcagtc
gtcccagatg
tggggcttct
ggagatgcat
ctcggcttta
attcccccca
tttatgtaga

//

21

aggaaggtgc
acacagagta
gaaggcctct
tgtgaagact
agcaagaccc
gagtggtcgg
gctggggaat
gcaaaagatg
atcctctcac
caggttggaa
tcaggaggct
tgaaggcaac
catggtcaga
tgggctgggc
cactctgtca
ccaggttcaa
caccacgccc
gatggtctcg
ctggatggag
aggagctatg
ctgcctggca
gactgcatgg
ttggggagcc
gtaccctaac
ctgccccgag
ccccgcaggg
tggagagggc
tcagtagccc
ccacatccct
tcccaggccg
ccatagtccc
gcagaagaaa
tgttgggggc
tccaaggagg
ctcagagcac
cgccccgtgt
aggcagccag
cagggtgaca
ttccctgggg
acaagcagag
gggagaggtg
gaggctcagg
tcctgggcct
ggtctgaccc
agcagctagg
gatggtggct
tcatcctgcc
cagtattcgg
catgggaaat
gcgagtgcat
ctcctgccca
accccttcac
ataaacattt

ctctctgaca
acaccgtgac
caggagggga
tcgcagcgga
agacttattt
agctgagatg
ttgcggtctc
ggatcttgct
ctcagcctcc
tgtgttaagt
gccgctgccg
cgatggactt
ttggcctctg
agggggttct
tcaggctgga
ccagttctcc
agctaatttt
aacctgactt
ctgtcatgcc
gccagggcag
ctgccacatc
ccagctccac
tgcctggccc
ctcctcacgc
ccccaaccca
agagatgatt
cgtgtctgac
agccctgccc
ccccatccca
tcaccttccc
accccacctg
atcatgatca
atcttcgcct
ccctggctgc
cctccctccc
cgtgtgcatg
cggggcgtga
caggccaccc
accctaacct
ccctgagggg
gccctgaggg
ctgccatggc
cagcttccct
caggtgtccc
gaggcaaagc
cctgttgtct
cttgccatcc
cagccagccg
gtgcccccgc
gcagcagggg
gtgactgtga
caaaggtctt
gtatctgta

agctcaaagg
cacgatgggg
catttgagct
gatgaagagc
caggaacaga
aagctggaga
ttctagattg
gtgttgccca
caaagtgctg
atgggagtga
ccatctggtg
ggagagggag
ggggcagcat
attttttttt
gtgcagtggc
tgcctcagcc
tgtattttta
caagttctta
tgaggatggg
gaccagactc
tctgacagtg
ccccaggatt
tctacacccc
cccctttttg
accctgagga
gacaggatcg
accaagaagg
aaggccatgc
cctctctccc
ctgcttggat
ccgccaacac
tcatctgctg
agaagccacc
tgccacctgg
ggcccccatg
atctctgtga
tgcagtgtgc
ttccttgcct
cgcctccagc
tggggaccag
acagcccagc
tccaggctcc
tcccacattc
tctggaaggg
aggctgcagt
tgcgctctgg
tcccatcgat
gggagggcta
cccaggaccc
atggggccgt
ccactgtccg
ggtacaacca

1- GENE TUTORIAL

17281
17341
17401
17461
17521
17581
17641
17701
17761
17821
17881
17941
18001
18061
18121
18181
18241
18301
18361
18421
18481
18541
18601
18661
18721
18781
18841
18901
18961
19021
19081
19141
19201
19261
19321
19381
19441
19501
19561
19621
19681
19741
19801
19861
19921
19981
20041
20101
20161
20221
20281
20341
20401

The CCDS Database entry CCDS34655.1, indicates that the consensus CDS protein set for stx1A was created from
two independent sources (WTSI-EBI, joint project between EMBL-EBI and the Wellcome Trust Sanger Institute, in
Europe) and NCBI, the National Center for Biotechnology Information, USA.

sequences used in
stx1A genomic builds

exon locations on -
strand of Chromosome 7

22

1- GENE TUTORIAL

All coding sequences derived from genomic DNA are reported in a Consensus CDS protein set or CCDS. You
can access the CCDS on the Entrez Gene webpage for stx1A (ID: 6804). Under Genomic regions, transcripts, and
products panel, you can click: CCDS34655.1 (on the right hand side of the panel) or can click CCDS under the
Links panel to the far right of the Entrez Gene webpage.

Example 1 illustrates a splice junction at the 3 end of the first exon (ACG) which continues reading through to the
5 end of the second exon (GCC). It is a phase 0 splice junction: where the triplet amino acid code does not span
a splice junction, so ACG / GCC codes for threonine (T) / alanine (A). In the second example, the red amino acid
serine S is coded from A in exon 6 and CC from exon 7.

- END OF TUTORIAL 1 -

23

1- GENE TUTORIAL

Under the CCDS Sequence Data, you can observe the DNA and corresponding amino acid sequences surrounding splice junctions. You can mouse over the nucleotide or protein sequence and click the highlighted codon
or residue to select the pair. The alternating black and blue fragments represents different exons. The junctions
between black and blue fragments represents splice junctions. Amino acids encoded across a splice junction are
indicated in red.

Вам также может понравиться