Вы находитесь на странице: 1из 21

2017

Biyoinformatics
Homework - 1
SEQUENCE COMPARISON WITH
DOTLET APPLICATION
Gebze Technical University

Ebru AKHARMAN
142204026
20.11.2017
20.11.2017

SEQUENCE COMPARISON WITH DOTLET


APPLICATION
HW1
Ebru AKHARMAN - 142204026, Gebze Technical University, Turkey
AIM:
A biological sequence refers to a sequence of characters which belong to DNA/RNA/protein. Two types of
biological sequences are most commonly known namely, Nucleotide Sequence and Protein Sequence.
Nucleotide sequence is mainly formed of four different nucleotides namely, adenine (A), guanine (G),
cytosine (C) and tyrosine (T). While protein sequence is formed of 20 different amino acids which are
commonly found. The nucleotides arrange themselves in the form of triplet code (triplet code refers to a
group of three nucleotides) to code for an amino acid. These sequences are properly indexed in the already
existing databases and it is possible to retrieve these sequences from their corresponding databases. The
purpose of this homework is to obtain the nucleotide and protein sequences of the given living species, to
find the length of these sequences, and to compare the sequences.We also compared the gene of the human
coagulation factor with the protein sequence in this homework .
INTRODUCTION:
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a
wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies
used include sequence alignment, searches against biological databases, and others. Since the development of
methods of high-throughput production of gene and protein sequences, the rate of addition of new sequences to
the databases increased exponentially. Such a collection of sequences does not, by itself, increase the scientist's
understanding of the biology of organisms. However, comparing these new sequences to those with known
functions is a key way of understanding the biology of an organism from which the new sequence comes. Thus,
sequence analysis can be used to assign function to genes and proteins by the study of the similarities between
the compared sequences. Nowadays, there are many tools and techniques that provide the sequence
comparisons (sequence alignment) and analyze the alignment product to understand its biology.Sequence
analysis in molecular biology includes a very wide range of relevant topics:

 Identification of intrinsic features of the sequence such as active sites, post translational
modification sites, gene-structures, reading frames, distributions,of introns and exons and
regulatory elements.
 The comparison of sequences in order to find similarity, often to infer if they are related
(homologous)
 Identification of sequence differences and variations such as point mutations and single
nucleotide polymorphism (SNP) in order to get the genetic marker.
 Revealing the evolution and genetic diversity of sequences and organisms
 Identification of molecular structure from sequence alone

Analyzing the results of a database search is always a matter of finding the best compromise between sensitivity
and specificity. A given database search will likely yield a handful of very similar sequences that are homolgous
and a larger handful of vaguely similar sequences some of which may be homologous. Homology (one of the
most commonly misused words in biology) means sharing a common ancestor. Two sequences are homologous
if there was a single sequence that gave rise to them both through duplication and divergence. If there is enough
similarity between two sequences to be detected through pairwise comparison, it is usually safe to infer that
they are homologous. Frustratingly, there are many sequences with little or no detectable similarity that really
are homolgous. A good search method will find as many of the true homologs as possible in a database while
cleanly separating them from the non-homologs. It will additionally be coupled to a good statistical scoring
method that accurately reports how likely it is that any given match arose purely by chance and not because the
sequences are really related.

[2]
20.11.2017

Homo sapiens:

For Protein Sequence of Homo sapiens:

Primary (citable) Accession Number: NP_002760

Lenghts of the Protein Sequences: 247


Protein Sequence:

MNPLLILTFVAAALAAPFDDDDKIVGGYNCEENSVPYQVSLNSGYHFCGGSLINEQWVVSAGHCYKSRIQ
VRLGEHNIEVLEGNEQFINAAKIIRHPQYDRKTLNNDIMLIKLSSRAVINARVSTISLPTAPPATGTKCL
ISGWGNTASSGADYPDELQCLDAPVLSQAKCEASYPGKITSNMFCVGFLEGGKDSCQGDSGGPVVCNGQL
QGVVSWGDGCAQKNKPGVYTKVYNYVKWIKNTIAANS

Date: 13-NOV-2017
Submitters And Dates:

AUTHORS (2017) Giefer MJ, Lowe ME, Werlin SL, Zimmerman B, Wilschanski M, Troendle
D, Schwarzenberg SJ, Pohl JF, Palermo J, Ooi CY, Morinville VD, Lin
TK, Husain SZ, Himes R, Heyman MB, Gonska T, Gariepy CE, Freedman
SD, Fishman DS, Bellin MD, Barth B, Abu-El-Haija M and Uc A. Indalao IL, Sawabuchi T,
Takahashi E and Kido H.
AUTHORS (2016) Cho SM, Shin S and Lee KA. Boulling A, Abrantes A, Masson E, Cooper DN, Robaszkiewicz M,
Chen JM and Ferec C.
AUTHORS (1993) LaRusch,J., Solomon,S. and Whitcomb,D.C. Solomon,S., Whitcomb,D.C. and LaRusch,J.
AUTHORS (1992) Yamamoto KK, Pousette A, Chow P, Wilson H, el Shami S and French CK.
AUTHORS (1989) Kimland M, Russick C, Marks WH and Borgstrom A.
AUTHORS (1987) Shieh,B.H. and Travis,J.
AUTHORS (1986) Emi,M., Nakamura,Y., Ogawa,M., Yamamoto,T., Nishide,T., Mori,T. And Matsubara,K.

For Nucleotide Sequence of Homo sapiens:


Primary (citable) Accession Number: NM_002769
Lenghts of the Protein Sequences: 831
Nucleotide Sequence:

1 aggcacactc taccaccatg aatccactcc tgatccttac ctttgtggca gctgctcttg


61 ctgccccctt tgatgatgat gacaagatcg ttgggggcta caactgtgag gagaattctg
121 tcccctacca ggtgtccctg aattctggct accacttctg tggtggctcc ctcatcaacg
181 aacagtgggt ggtatcagca ggccactgct acaagtcccg catccaggtg agactgggag
241 agcacaacat cgaagtcctg gaggggaatg agcagttcat caatgcagcc aagatcatcc
301 gccaccccca atacgacagg aagactctga acaatgacat catgttaatc aagctctcct
361 cacgtgcagt aatcaacgcc cgcgtgtcca ccatctctct gcccaccgcc cctccagcca
421 ctggcacgaa gtgcctcatc tctggctggg gcaacactgc gagctctggc gccgactacc
481 cagacgagct gcagtgcctg gatgctcctg tgctgagcca ggctaagtgt gaagcctcct
541 accctggaaa gattaccagc aacatgttct gtgtgggctt ccttgaggga ggcaaggatt
601 catgtcaggg tgattctggt ggccctgtgg tctgcaatgg acagctccaa ggagttgtct
661 cctggggtga tggctgtgcc cagaagaaca agcctggagt ctacaccaag gtctacaact
721 atgtgaaatg gattaagaac accatagctg ccaatagcta aagcccccag tatctcttca
781 gtctctatac caataaagtg accctgttct cactgtcaaa aaaaaaaaaa a

Date: 13-NOV-2017

[3]
20.11.2017

Submitters And Dates:

AUTHORS (2017) Giefer MJ, Lowe ME, Werlin SL, Zimmerman B, Wilschanski M, Troendle
D, Schwarzenberg SJ, Pohl JF, Palermo J, Ooi CY, Morinville VD, Lin
TK, Husain SZ, Himes R, Heyman MB, Gonska T, Gariepy CE, Freedman
SD, Fishman DS, Bellin MD, Barth B, Abu-El-Haija M and Uc A.
Indalao IL, Sawabuchi T, Takahashi E and Kido H.
AUTHORS (2016) Cho SM, Shin S and Lee KA.
Boulling A, Abrantes A, Masson E, Cooper DN, Robaszkiewicz M, Chen
JM and Ferec C.
AUTHORS (1993) LaRusch,J., Solomon,S. and Whitcomb,D.C.
Solomon,S., Whitcomb,D.C. and LaRusch,J.
AUTHORS (1992) Yamamoto KK, Pousette A, Chow P, Wilson H, el Shami S and French CK.
AUTHORS (1989) Kimland M, Russick C, Marks WH and Borgstrom A.
AUTHORS (1987) Shieh,B.H. and Travis,J.
AUTHORS (1986) Emi,M., Nakamura,Y., Ogawa,M., Yamamoto,T., Nishide,T., Mori,T. and Matsubara,K.

Şekil 1: Dotlet Image of Homo sapiens – Homo sapiens Nucleotides

[4]
20.11.2017

Şekil 2: The Comparison of Homo sapiens - Homo sapiens Nucleotides

Homo sapiens 'trypsin - 1 nucleotide sequence was accessed by the NCBI database and Homo sapiens' trypsin - 1
protein sequence was accessed by Uniprot. The datas were obtained from these databases are given above. It has
been seen that the nucleotides are compared with each other via Blast. In addition, this comparison has been made
into a visual graph via Dotlet. The horizontal and vertical axis of this graph shows the nucleotide sequence of Homo
sapiens. Normally it is seen that there is an overlap. Homologists have formed a proper diagonal.

Rattus norvegicus:

For Protein Sequence of Rattus norvegicus:

Primary (citable) Accession Number: NP_036767


Lenghts of the Protein Sequences: 246
Protein Sequence:
MSALLILALVGAAVAFPLEDDDKIVGGYTCPEHSVPYQVSLNSGYHFCGGSLINDQWVVS
AAHCYKSRIQVRLGEHNINVLEGDEQFINAAKIIKHPNYSSWTLNNDIMLIKLSSPVKLN
ARVAPVALPSACAPAGTQCLISGWGNTLSNGVNNPDLLQCVDAPVLSQADCEAAYPGEIT
SSMICVGFLEGGKDSCQGDSGGPVVCNGQLQGIVSWGYGCALPDNPGVYTKVCNFVGWIQ
DTIAAN
Submitters And Dates:
AUTHORS (2012) Bastos-Amador P, Royo F, Gonzalez E, Conde-Vancells J, Palomo-Diez
L, Borras FE and Falcon-Perez JM.
Chang M, Alsaigh T, Kistler EB and Schmid-Schonbein GW.
AUTHORS (2008) Lauhio A, Sorsa T, Srinivas R, Stenman M, Tervahartiala T, Stenman
UH, Gronhagen-Riska C and Honkanen E.
AUTHORS (2002) Towatari T, Ide M, Ohba K, Chiba Y, Murakami M, Shiota M, Kawachi
M, Yamada H and Kido H.
AUTHORS (2000) Hirano T.
AUTHORS (1990) Dakka N, Puigserver A and Wicker C.
AUTHORS (1989) Dubick MA and Majumdar AP.
AUTHORS (1988) Dubick MA, Palmer R, Lau PP, Morrill PR and Geokas MC.
AUTHORS (1986) Fletcher,T.S., Tsukamoto,H. and Largman,C. Majumdar,A.P., Vesenka,G.D.,
Dubick,M.A., Yu,G.S., DeMorrow,J.M. and Geokas,M.C.
[5]
20.11.2017

For Nucleotide Sequence of Rattus norvegicus:

Primary (citable) Accession Number: NM_012635


Lenghts of the Nucleotide Sequences: 804
Nucleotide Sequence:

1 ccttctgcca ccatgagtgc acttctgatc ctagcccttg tgggagctgc tgttgctttc


61 cctttggaag atgatgacaa gatcgttgga ggatacacct gcccggaaca ttctgtcccc
121 taccaggtgt ccctgaactc tggctaccac ttctgtggag gttccctcat caatgaccag
181 tgggtggtgt ctgcagctca ctgctacaaa tcccgcatcc aagtgagact gggagagcac
241 aacatcaatg tccttgaggg cgatgagcaa tttatcaatg ctgccaagat catcaagcac
301 cccaactata gttcgtggac cctgaacaat gacatcatgc tgatcaagct ctcttcccct
361 gtgaaactca atgcccgagt ggcccctgta gctctgccca gcgcctgtgc acctgcaggc
421 actcagtgcc tcatctctgg ctggggcaac accctcagca atggtgtgaa caacccagac
481 ctgctccaat gcgtggatgc cccagtgctg tctcaggctg actgtgaagc cgcctaccct
541 ggggaaatca ccagcagcat gatttgtgtt ggcttcctgg agggaggcaa agattcctgc
601 cagggtgact ctggtggccc ggtggtctgc aatggacagc tccagggtat tgtctcctgg
661 ggctatggtt gtgccctgcc agacaaccct ggtgtgtaca ccaaggtctg caactttgtg
721 ggctggattc aggacaccat tgctgcaaac taaatatctt cagtctctct tcaatcagtg
781 tgtcaataaa gttcatttgc cctc

Date: 01-OCT-2017

Submitters And Dates:


AUTHORS (2012) Bastos-Amador P, Royo F, Gonzalez E, Conde-Vancells J, Palomo-Diez
L, Borras FE and Falcon-Perez JM.
Chang M, Alsaigh T, Kistler EB and Schmid-Schonbein GW.
AUTHORS (2008) Lauhio A, Sorsa T, Srinivas R, Stenman M, Tervahartiala T, Stenman
UH, Gronhagen-Riska C and Honkanen E.
AUTHORS (2002) Towatari T, Ide M, Ohba K, Chiba Y, Murakami M, Shiota M, Kawachi
M, Yamada H and Kido H.
AUTHORS (2000) Hirano T.
AUTHORS (1990) Dakka N, Puigserver A and Wicker C.
AUTHORS (1989) Dubick MA and Majumdar AP.
AUTHORS (1988) Dubick MA, Palmer R, Lau PP, Morrill PR and Geokas MC.
AUTHORS (1986) Fletcher,T.S., Tsukamoto,H. and Largman,C. Majumdar,A.P., Vesenka,G.D.,
Dubick,M.A., Yu,G.S., DeMorrow,J.M. and Geokas,M.C.

[6]
20.11.2017

Tablo 1:1. Step

Tablo 2: 2. Step

Tablo 3: 3. Step

[7]
20.11.2017

Tablo 4: 4.Step

Tablo 5: 5. Step

[8]
20.11.2017

Tablo 6: 6. Step

Tablo 7: 7. Step

[9]
20.11.2017

Tablo 8: 8. Step

Tablo 9: 9. Step

[10]
20.11.2017

Tablo 10: 10. Step

Şekil 3: The Comparison of Homo sapiens - Rattus norvegicus Nucleotides

[11]
20.11.2017

Şekil 4: Dotlet Image of Homo sapiens – Rattus norvegicus Nucleotides

Other applications were repeated following the steps mentioned above. Nucleotides indicated by red are
nucleotides of Rattus norvegicus which are different from Homo sapiens. This may be due to differences in
insertions, mutations, and deletions. Nucleotides represented by the dots are nucleotides that are the
same between Homo sapiens and Rattus Norvegicus. Rattus norvegicus' trypsin - 1 nucleotide sequence was
accessed by the NCBI database and Rattus norvegicus' trypsin - 1 protein sequence was accessed by Uniprot. The
datas were obtained from these databases are given above. It has been seen that the nucleotides are compared with
each other via Blast. In addition, this comparison has been made into a visual graph via Dotlet. Nucleotides of the
horizontal axis belong to Homo sapiens, the nucleotidse of the vertical axis belong to Rattus norvegicus. When we
click on the "COMPUTE" button, the result obtained when we clear the graph is shown in Figure 3 above. The black
areas on this graph show common homologs that two separate species have. The spacing between homologous
nucleotides can be due to mutations, insertions and deletions.

Salmo salar:

For Protein Sequence of Salmo salar:

Primary (citable) Accession Number: NP_001119704


Lenghts of the Protein Sequences: 242
Protein Sequence:
MISLVFVLLIGAAFATEDDKIVGGYECKAYSQAHQVSLNSGYHFCGGSLVNENWVVSAAH
CYKSRVEVRLGEHNIKVTEGSEQFISSSRVIRHPNYSSYNIDNDIMLIKLSKPATLNTYV
QPVALPTSCAPAGTMCTVSGWGNTMSSTADSNKLQCLNIPILSYSDCNNSYPGMITNAMF
CAGYLEGGKDSCQGDSGGPVVCNGELQGVVSWGYGCAEPGNPGVYAKVCIFNDWLTSTMA
SY
Date: 02-OCT-2017

Submitters And Dates:


AUTHORS (1995) Male R, Lorens JB, Smalas AO and Torrissen KR.
AUTHORS (1994) Smalas AO, Heimstad ES, Hordvik A, Willassen NP and Male R.
AUTHORS (1993) Smalas A and Hordvik A.

[12]
20.11.2017
For Nucleotide Sequence of Salmo salar:

Primary (citable) Accession Number: NM_001126232


Lenghts of the Nucleotide Sequences: 861

Nucleotide Sequence:

1 gcaaccatga tttctctggt cttcgttctg ctcattggag ccgctttcgc cacagaggac


61 gacaagatcg tcggagggta tgagtgcaag gcctactccc agacccacca ggtgtctctg
121 aactctggat accacttctg tggtggctcc ttggtcaatg agaactgggt tgtgtctgct
181 gctcactgct acaagtcccg tgtggaggtg cgtctgggcg agcacaacat caaggtgact
241 gaaggtagcg agcagttcat ctcttcatcc cgcgtgatcc gtcaccccaa ctacagctcc
301 tacaacatcg ataatgacat catgctgatc aaactgagca aacccgccac cctcaacacc
361 tacgtgcagc ctgttgctct gcccaccagc tgtgcccccg ctggcaccat gtgtaccgtc
421 tctggatggg gcaacaccat gagctccact gctgacagca acaagctgca gtgcctgaac
481 atccccatcc tgtcctacag cgactgtaac aactcctacc ctggcatgat caccaacgcc
541 atgttctgtg ctggatacct ggagggaggc aaggactctt gccagggtga ctctggtggc
601 cctgtggtgt gcaatggtga gctccagggt gttgtgtcct ggggttacgg atgtgctgag
661 cccggtaacc ctggtgtcta cgccaaggtt tgcatcttca atgactggct gaccagcacc
721 atggcctcct actaagtctg atcctagctt cggtcctcca gcacggtccc acaactctac
781 aacatcccgt tccagatcaa catccacctt ttgtacggga gactagacat tatttatgtt
841 tatgataaat aaaaaatgta ac

Date: 02-OCT-2017

Submitters And Dates:

AUTHORS (1995) Male R, Lorens JB, Smalas AO and Torrissen KR.


AUTHORS (1994) Smalas AO, Heimstad ES, Hordvik A, Willassen NP and Male R.
AUTHORS (1993) Smalas A and Hordvik A.

Şekil 3: Dotlet Image of Homo sapiens – Salmo salar Nucleotides

[13]
20.11.2017

Şekil 4: The Comparison of Homo sapiens - Salmo salar Nucleotides

Salmo salar' trypsin - 1 nucleotide sequence was accessed by the NCBI database and Salmo salar' trypsin - 1 protein
sequence was accessed by Uniprot. The datas were obtained from these databases are given above. It has been seen
that the nucleotides are compared with each other via Blast. In addition, this comparison has been made into a visual
graph via Dotlet. Nucleotides of the horizontal axis belong to Homo sapiens, the nucleotidse of the vertical axis belong
to Salmo salar. When we click on the "COMPUTE" button, the result obtained when we clear the graph is shown in
Figure 5 above. The black areas on this graph show common homologs that two separate species have. The spacing
between homologous nucleotides can be due to mutations, insertions and deletions.

Anopheles gambiae:

For Protein Sequence of Anopheles gambiae:

Primary (citable) Accession Number: P35035


Second Accession Number: Q7PN85
Lenghts of the Protein Sequences: 274

Protein Sequence:

MSNKIAILLAVLVAVVACAEAQANQRHRLVRPSPSFSPRPRYAVGQRIVGGFEIDVSDAP
YQVSLQYNKRHNCGGSVLSSKWVLTAAHCTAGASPSSLTVRLGTSRHASGGTVVRVARVV
QHPKYDSSSIDFDYSLLELEDELTFSDSVQPVGLPKQDETVKDGTMTTVSGWGNTQSAAE
SNAVLRAANVPTVNQKECNKAYSEFGGVTDRMLCAGYQQGGKDACQGDSGGPLVADGKLV
GVVSWGYGCAQAGYPGVYSRVAVVRDWVRENSGV

Date: 07-JUN-2017

Submitters And Dates:

AUTHORS (1993) Muller,H.M., Crampton,J.M., della Torre,A., Sinden,R. And Crisanti,A.


AUTHORS (1995) Muller,H.M., Catteruccia,F., Vizioli,J., della Torre,A. And Crisanti,A.
AUTHORS (2002) Holt,R.A., Subramanian,G.M., Halpern,A., Sutton,G.G., Charlab,R.,
Nusskern,D.R., Wincker,P., Clark,A.G., Ribeiro,J.M., Wides,R.,
Salzberg,S.L., Loftus,B., Yandell,M., Majoros,W.H., Rusch,D.B.,
[14]
20.11.2017

Lai,Z., Kraft,C.L., Abril,J.F., Anthouard,V., Arensburger,P.,


Atkinson,P.W., Baden,H., de Berardinis,V., Baldwin,D., Benes,V.,
Biedler,J., Blass,C., Bolanos,R., Boscus,D., Barnstead,M., Cai,S.,
Center,A., Chaturverdi,K., Christophides,G.K., Chrystal,M.A.,
Clamp,M., Cravchik,A., Curwen,V., Dana,A., Delcher,A., Dew,I.,
Evans,C.A., Flanigan,M., Grundschober-Freimoser,A., Friedli,L.,
Gu,Z., Guan,P., Guigo,R., Hillenmeyer,M.E., Hladun,S.L.,
Hogan,J.R., Hong,Y.S., Hoover,J., Jaillon,O., Ke,Z., Kodira,C.,
Kokoza,E., Koutsos,A., Letunic,I., Levitsky,A., Liang,Y., Lin,J.J.,
Lobo,N.F., Lopez,J.R., Malek,J.A., McIntosh,T.C., Meister,S.,
Miller,J., Mobarry,C., Mongin,E., Murphy,S.D., O'Brochta,D.A.,
Pfannkoch,C., Qi,R., Regier,M.A., Remington,K., Shao,H.,
Sharakhova,M.V., Sitter,C.D., Shetty,J., Smith,T.J., Strong,R.,
Sun,J., Thomasova,D., Ton,L.Q., Topalis,P., Tu,Z., Unger,M.F.,
Walenz,B., Wang,A., Wang,J., Wang,M., Wang,X., Woodford,K.J.,
Wortman,J.R., Wu,M., Yao,A., Zdobnov,E.M., Zhang,H., Zhao,Q.,
Zhao,S., Zhu,S.C., Zhimulev,I., Coluzzi,M., della Torre,A.,
Roth,C.W., Louis,C., Kalush,F., Mural,R.J., Myers,E.W., Adams,M.D.,
Smith,H.O., Broder,S., Gardner,M.J., Fraser,C.M., Birney,E.,
Bork,P., Brey,P.T., Venter,J.C., Weissenbach,J., Kafatos,F.C.,
Collins,F.H. and Hoffman,S.L.

For Nucleotide Sequence of Anopheles gambiae:

Primary (citable) Accession Number: EX473359


Lenghts of the Nucleotide Sequences: 370

Nucleotide Sequence:

1 ggccgccctt tttttttttt tttttttttt tttgatgcaa aaaacgtcca atttattaac


61 caaaaatatt atatacggtc aagcgcatta aacatcaacc gattctttta tccaatcccg
121 tactgctgat acacgggaat aaactcctgg atatctcggt tttgcacagc catatcccca
181 tgataccaca cctactacga ctccctccca ggtcagaggt ccaccggaat ctccttgaca
241 agcatccttt ccaccttcca ataccccggc gcacatcatc ctttcagtga ttccaccaaa
301 acctttgtag gcatcattgc atgccttatc attgtatttt ggtactttgg ctgcgcgtaa
361 ctgatttcct

Date: 03-OCT-2007

Submitters And Dates:

AUTHORS (1995) Male R, Lorens JB, Smalas AO and Torrissen KR.


AUTHORS (1994) Smalas AO, Heimstad ES, Hordvik A, Willassen NP and Male R.
AUTHORS (1993) Smalas A and Hordvik A.

[15]
20.11.2017

Şekil 5: Dotlet Image of Homo sapiens – Anopheles gambiae Nucleotides

Şekil 6: The Comparison of Homo sapiens - Anopheles gambiae Nucleotides

Anopheles gambiae' trypsin - 1 nucleotide sequence was accessed by the NCBI database and Anopheles gambiae'
trypsin - 1 protein sequence was accessed by Uniprot. The datas were obtained from these databases are given above.
It has been seen that the nucleotides are compared with each other via Blast. In addition, this comparison has been
made into a visual graph via Dotlet. Nucleotides of the horizontal axis belong to Homo sapiens, the nucleotidse of the
vertical axis belong to Anopheles gambiae. When we click on the "COMPUTE" button, the result obtained when we
clear the graph is shown in Figure 7 above. The black areas on this graph show repeating nucleotides that two
separate species have. The spacing between homologous nucleotides can be due to mutations, insertions and
deletions.

[16]
20.11.2017

Homo sapiens coagulation factor XII (F12), mRNA (5700 bp)

For Nucleotide Sequence of Homo sapiens coagulation factor XII :

Primary (citable) Accession Number: AH005292 J02807 M17464 M17465 M17466


Lenghts of the Nucleotide Sequences: 5700 bp

Nucleotide Sequence:

GTGGGTATTGTTGTAAGATGCTGAGTTTATGGTAGTTTGTTACATGACAATAGAAAATGAACACACTTCA
CAGTGGACTCCAAGATCCCCATGATCTTTGATCTCCTTAACCTCCTGATCTCCACAGGACCCAGAGCATA
AGAATGTCCCTTCTTCTGCTTCCAGTCCCACTATCTAGAAAAGAGAGGAGGAGCCCAGCTCTTCATTTCA
CCCCCACCCACAAACTCCCAACTTTCCGGCCCTCAAGGGGTGACCAAGGAAGTTGCTCCACTTGGCTTTC
CACAAACAGCCTGTGCCCCACCAGGCTCAGGAGGGCAGCTTGACCAATCTCTATTTCCAAGACCTTTGGC
CAGTCCTATTGATCTGGACTCCTGGATAGGCAGCTGGACCAACGGACGGACGCCATGAGGGCTCTGCTGC
TCCTGGGGTTCCTGCTGGTGAGCTTGGAGTCAACACTTTCGGTGAGTGCTGTGGGAACCAGGATTGTCCC
AGGATTGTTCTGGGGGGTCGCTATCACAGCCATGAGCCATGGCCTCTGCTCATGACCTGTGGGTCCAGGT
GACTAGGAGGCCTATGTGGAAAGGTGAGGCCAGCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGGC
CAGCCCGGAAGGCCCAGGCAGAGGAGACAGACAACCAGACTGGGTGGATACAAGGGCACAGCCTGCATTT
CTGGGGGAGATGGGCCTTAAGAAGACAACGGGGGGAGGTAGAAAGGGAAAGGGTCTTGGGAAGAAATCTC
TGCATTTCTGGGCTGTGAGAGGAAGCTGCAGACTAGCAACAGATCGGTGGCAGGCTATGACTTATAGTCA
GTTCCCTGCCTTCTTCTCTCCCTTGTAGATTCCACCTTGGGAAGCCCCCAAGGAGCATAAGTACAAAGCT
GAAGAGCACACAGTCGGTAAGTGGCCTGGCTCCTCCTCCCGGGAACCCTTGGGTGGGGATGTGTATGGTG
CAGTGTGTGCAGTCTCAGGGCAGTCTAGTCTAGTGCCTACCTGGTGCTAGGTCTTATGCCCATGGGCACT
AGAGTGATCGTGAGCTGTGTGATCCTTGAGGGCAGGGTATGGGCTGTGTCTAAGTGCCCACGAGCCTGGC
TCGGAGCAGGTGCTTGAGATATGTGCTGCTGGCGCCATCACACCTGGGCTCCTGCCAGCCTTCCTCAGTT
TCCCCAGCTTCTCCCCTTCTTTTCCTTTCCCCAGTACGTCTCATGGGCATCATTCATGCCACACAGAGGC
CAGGGCCTTCAATGGGCAAGGAAGGATCAAGAGCTTGTCTCTGGCATCTGAATGCCTCTGAAGCCCAGCT
TTATGAGTTATGAGCTGGGTGACTCTGGGCGAGGGATTTGAGTTCTCCAAGCTTCAATTTCNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNTCAGAGAGTGTGTTGTCCCTGCAGTTCTCACTGTCACCGGGGAGCCCTG
CCACTTCCCCTTCCAGTACCACCGGCAGCTGTACCACAAATGTACCCACAAGGGCCGGCCAGGCCCTCAG
CCCTGGTAAGACTACGCAGAGGAGTTGGAGCAGGGGCCTGGGAGACATGTACCCTGCCTGTCCTTCTGTC
CAAGGAACTCTGCTTGGAGAGAGGGGACTGTGATAGGGCAGGGTGGGCCAGGCCCCTGGGTAGAGCAGGG
AAGCCTTGTCTCTTTCTACAGGTGTGCTACCACCCCCAACTTTGATCAGGACCAGCGATGGGGATACTGT
TTGGAGCCCAAGAAAGTGAAAGGTGCTACACACAGCCTCTGGGGTGGCCTGGGGCTCTCTCCTCCCGCCT
CATTACTCTCCTGGTATCACCAGACCCCACACACCTGGGATTCTGGACCCAGCCCCTTCTCTCCCTCCAC
AATACCCTTTGGAAGTCCAGAGGGAGAGTTCTGGGAAGGAGTGGTCCCATTTTGCAGGTGGGTAAACCAA
GCTTGGAAACTTGGAGTAGCAAGGTCACAAGGCAAGTAGGTTCAAGAAGGGCCTTGGCCCCAGCTGTGTG
ACTCAGCTCCCTGCTCTTCCTTCCACCATGTCCATCTCTCAGACCACTGCAGCAAACACAGCCCCTGCCA
GAAAGGAGGGACCTGTGTGAACATGCCAAGCGGCCCCCACTGTCTCTGTCCACAACACCTCACTGGAAAC
CACTGCCAGAAAGGTGAGGAGATGTGGAGGACCTGGGCGGGGTGCTGGGGGACAGGGGCAACCCTGGGCC
TACAGAATAGGTTGCTGGATACTCGGAGACTTGGCATGGTCCTAGACTCTCCTGAGACCACTATCCCTCT
TTGTCCCCAGAGAAGTGCTTTGAGCCTCAGCTTCTCCGGTTTTTCCACAAGAATGAGATATGGTATAGAA
CTGAGCAAGCAGCTGTGGCCAGATGCCAGTGCAAGGGTCCTGATGCCCACTGCCAGCGGCTGGCCAGCCA
GGGTGAGCAGATGGTTGGGAACGGGCCAGGGAGGAGCGTCAGGAAGACAGGCTGGCAGGAGGCCGGGTGG
TGTGCCGGGAAGGAGAGCTCTCTGGGGGGGTCTTTAGGCCCAGGGGTGGCTCACTGCGTTCCCTCCCCAA
GCCTGCCGCACCAACCCGTGCCTCCATGGGGGTCGCTGCCTAGAGGTGGAGGGCCACCGCCTGTGCCACT
GCCCGGTGGGCTACACCGGACCCTTCTGCGACGTGGGTGAGTGAGGGTCTGGGGCAAGCAGAAGGCCAGC
CCCCAGGTGGGACGGGCTTGCCAGGAAGGAGGAGGGAGAGTGCGGAAAGCAGATGAGAGGGAGGCAGGAG
AGCCCAGCCTTGGCTGCCCAGGGAGCCCCCTTTCTCCTCAGACACCAAGGCAAGCTGCTATGATGGCCGC
GGGCTCAGCTACCGCGGCCTGGCCAGGACCACGCTCTCGGGTGCGCCCTGTCAGCCGTGGGCCTCGGAGG
CCACCTACCGGAACGTGACTGCCGAGCAAGCGCGGAACTGGGGACTGGGCGGCCACGCCTTCTGCCGGTG
CGCCGCGTGGGGCTGGGTGACCCCTCCGCCCCAGGGCTCCGGGCTCCCGGCGCTCTAACGGCGCCCCGTC
GTGTGGCTACAGGAACCCGGACAACGACATCCGCCCGTGGTGCTTCGTGCTGAACCGCGACCGGCTGAGC
TGGGAGTACTGCGACCTGGCACAGTGCCAGACCCCAACCCAGGCGGCGCCTCCGACCCCGGTGTCCCCTA
GGCTTCATGTCCCACTCATGCCCGCGCAGCCGGCACCGCCGAAGCCTCAGCCCACGACCCGGACCCCGCC
TCAGTCCCAGACCCCGGGAGGTTAGGAAGTGGGGGGGGAAGGAGGAGCCGAGAGGGCGCCGGGCGAGCTA
GATTCCGGCCAGCCGGCCGCGGGCTCCCCGTCCTCAGCCCCTGCTCCTCCACAGCCTTGCCGGCGAAGCG
GGAGCAGCCGCCTTCCCTGACCAGGAACGGCCCACTGAGCTGCGGGCAGCGGCTCCGCAAGAGTCTGTCT
TCGATGACCCGCGTCGTTGGCGGGCTGGTGGCGCTACGCGGGGCGCACCCCTACATCGCCGCGCTGTACT
GGGGCCACAGTTTCTGCGCCGGCAGCCTCATCGCCCCCTGCTGGGTGCTGACGGCCGCTCACTGCCTGCA
GGACCGGCGAGTACCCGCCCGCCCAGAGCCGCCCCAGGGGCCGCGGCTCCTCCGTCTCCCAGCGCAGCTT
CCACGCTGCACCCGAACCCGTGCCCTACCTTCTCCCGCCCCACCCTTCTTTCCACGCCCCTCCGGAGCTC
CCGGGGAGGAAGCTGGAACACGGGATTGGGGTTCGGGAGCAGGGGGCTTCCCCAGAACGCTTGTGGCCAG
GTCTGAGAGCGCTGCCTCTCCCCTACCCTCCCCGCAGGCCCGCACCCGAGGATCTGACGGTGGTGCTCGG
CCAGGAACGCCGTAACCACAGCTGTGAGCCGTGCCAGACGTTGGCCGTGCGCTCCTACCGCTTGCACGAG
GCCTTCTCGCCCGTCAGCTACCAGCACGACCTGGGTGCGTGGGGGCGCCCCGCGGGGACGGGAAGAGAGC
TTGGGCACGGCGTCCCCGCCTCACGCTCCTCTCCGCCCGGGTTAGCTCTGTTGCGCCTTCAGGAGGATGC
GGACGGCAGCTGCGCGCTCCTGTCGCCTTACGTTCAGCCGGTGTGCCTGCCAAGCGGCGCCGCGCGACCC
TCCGAGACCACGCTCTGCCAGGTGGCCGGCTGGGGCCACCAGTTCGAGGGTAGGCACAACTGCTAGGGGC
[17]
20.11.2017
AGGGGTAGGGGAGGAGACCTTTGATCACTGGGTTAGGCGGAAGAAGCCCGCGACTTTGGTATCGTTCCGG
GTGCCTACAGAATGGGTGGCGCTGACCTGATGGGTTGTGAGAATGTGTAGGTGAATCCCAGGTAGAATCC
CAGGGCCTGGGATTCACTGCTGGGATCCCCAAATCTCCTGGGGATACAGGGAGAATCGAACTTGCTCTTG
GTTCCCTCTGGGCGCCGGGCTGCAAAGGCCAACTAGGACGCTGGCCCCGCGCTCCGGGCTAGTGTGGGAG
CCAGGTTCTGCGACTCTGGATGGGTGGTGGGGGAGGGGTTTCTGTTTCCGCTCCGCCCATTCAAATCCTG
GCTTTTCTCTGGACCTCAGCCTCCTTGCCTATGAAATTGAATTAATGGCACCTCCTCCCCTTCGGGCTTG
CTGCGAGAGAGGAAGGGCATGAGTGGGTTTACAAGCGCCTGGAGCAGCTTTGTCCATCGTCCGGGCGGCA
AGCGTTGTCAGATGGGGTGTGAAGAAGGCGCTCTGTGTTCGCAGGGGCGGAGGAATATGCCAGCTTCCTG
CAGGAGGCGCAGGTACCGTTCCTCTCCCTGGAGCGCTGCTCAGCCCCGGACGTGCACGGATCCTCCATCC
TCCCCGGCATGCTCTGCGCAGGGTTCCTCGAGGGCGGCACCGATGCGTGCCAGGTGAGCTCTTAGCCCGG
TTGGCGCCCTTCCCCGAGGCCGTCAGGCACAAATCTCAGGTCCACAGCGCTGAGCTGCGTGTTTCCGACC
CAGGGTGATTCCGGAGGCCCGCTGGTGTGTGAGGACCAAGCTGCAGAGCGCCGGCTCACCCTGCAAGGCA
TCATCAGCTGGGGATCGGGCTGTGGTGACCGCAACAAGCCAGGCGTCTACACCGATGTGGCCTACTACCT
GGCCTGGATCCGGGAGCACACCGTTTCCTGATTGCTCAGGGACTCATCTTTCCCTCCTTGGTGATTCCGC
AGTGAGAGAGTGGCTGGGGCATGGAAGGCAAGATTGTGTCCCATTCCCCCAGTGCGGCCAGCTCCGCGCC
AGGATGGCGCAGGAACTCAATAAAGTGCTTTGAAAATGCTGAGAAGGAAAGCTCTTTTCTTCATGGGTCC
GCCGGGAAATGCCAAGACAGAAAAGCGATTCACAGCTTCTCCACAGCTCTCAGAGAACAAGGTCTATGAG

Date: 01-AUG-2016

Submitters And Dates:

AUTHORS (1987) Cool,D.E. and MacGillivray,R.T.


AUTHORS (1995) Schloesser,M., Hofferbert,S., Bartz,U., Lutze,G., Lammle,B. and Engel,W.

For Protein Sequence of Homo sapiens coagulation factor XII:

Primary (citable) Accession Number: NP_000496


Lenghts of the Protein Sequences: 615 aa

Protein Sequence:

1 mrallllgfl lvslestlsi ppweapkehk ykaeehtvvl tvtgepchfp fqyhrqlyhk


61 cthkgrpgpq pwcattpnfd qdqrwgycle pkkvkdhcsk hspcqkggtc vnmpsgphcl
121 cpqhltgnhc qkekcfepql lrffhkneiw yrteqaavar cqckgpdahc qrlasqacrt
181 npclhggrcl eveghrlchc pvgytgafcd vdtkascydg rglsyrglar ttlsgapcqp
241 waseatyrnv taeqarnwgl gghafcrnpd ndirpwcfvl nrdrlsweyc dlaqcqtptq
301 aapptpvspr lhvplmpaqp appkpqpttr tppqsqtpga lpakreqpps ltrngplscg
361 qrlrkslssm trvvgglval rgahpyiaal ywghsfcags liapcwvlta ahclqdrpap
421 edltvvlgqe rrnhscepcq tlavrsyrlh eafspvsyqh dlallrlqed adgscallsp
481 yvqpvclpsg aarpsettlc qvagwghqfe gaeeyasflq eaqvpflsle rcsapdvhgs
541 silpgmlcag fleggtdacq gdsggplvce dqaaerrltl qgiiswgsgc gdrnkpgvyt
601 dvayylawir ehtvs

Date: 10-SEP-2017
Submitters And Dates:
AUTHORS (2017) Sylman JL, Daalkhaijav U, Zhang Y, Gray EM, Farhang PA, Chu TT, Zilberman-Rudenko J, Puy
Tucker EI, Smith SA, Morrissey JH, Walker TW, Nan XL, Gruber A and McCarty OJT. Ivanov I,
Matafonov A, Sun MF, Cheng Q, Dickeson SK, Verhamme IM, Emsley J and Gailani D.
AUTHORS (2016) Mitchell JL, Lionikiene AS, Georgiev G, Klemmer A, Brain C, Kim PY and Mutch NJ. Pinero-
Saavedra M, Gonzalez-Quevedo T, Saenz de San Pedro B, Alcaraz C, Bobadilla-Gonzalez P, Fernandez-Vieira L,
Hinojosa B and Garcia-Lozano R.
AUTHORS (2009)Calvo SE, Pagliarini DJ and Mootha VK.
AUTHORS (1992)Harris RJ, Ling VT and Spellman MW.
AUTHORS (1991)McMullen BA, Fujikawa K and Davie EW.
AUTHORS (1989)Miyata T, Kawabata S, Iwanaga S, Takahashi I, Alving B and Saito H.
AUTHORS (1983)Fujikawa,K. and McMullen,B.A.
AUTHORS (1979)Mandle,R.J. Jr. and Kaplan,A.P.

[18]
20.11.2017

Şekil 7: Dotlet Image of Homo sapiens – Homo sapiens XII Factor Nucleotides

Şekil 8: Dotlet Image Nucleotides of Homo sapiens – Protein Sequence of Homo sapiens XII Factor

[19]
20.11.2017

Şekil 9: Exons for Human Coagulation Factor

When the homo sapiens XII coagulation factor is compared with the gene itself, dark areas appear on the screen.
The dark areas shown in Figure 9 are low comlex areas. What are these areas and what do they do?

Low-complexity regions (LCRs) are amino acid sequences that contain repeats of single amino acids or short
amino acid motifs. They are extremely abundant in eukaryotic proteins. In fact, the majority of proteins from a
wide range of eukaryotic species show a significant tendency toward being more repetitive than expected given
their amino acid composition. Many LCRs are highly unstable due to the action of replication slippage and
recombination, and the uncontrolled expansion of short sequence motifs causes several human diseases,
including Huntington’s disease and other neurodegenerative disorders, as well as a number of developmental
diseases.

The gene of coagulation factor of human nucleotide sequence was accessed by the NCBI database and the protein
of coagulation factor of human protein sequence was accessed by Uniprot. The datas were obtained from these
databases are given above. The datas are comparied via Dotlet. Nucleotides of the horizontal axis belong to Homo
sapiens, the nucleotidse of the vertical axis belong to Anopheles gambiae. When we click on the "COMPUTE"
button, the result obtained when we slowly clear the graph is shown in Figure 10 above. The coagulation gene
has repeating region, but the protein sequences do not have same repeating region. So, this region is more clear
than other regions. Also, The region is expanding Ns.

The graph obtained in Figure 10 is clarified and reduced by 1:10. By following these steps, the graph of FIG. 11 is
obtained. There are consecutive diagonal lines in this chart. These lines are exons. Eukaryotic genes are 'split'
genes. There are protein coding exons interrupted by introns, which are removed during splicing and exons are
rejoined before translation.

Exons: coding portions of any gene - the exons of a gene contain the sequence that actually codify the gene's
product - the protein. Introns: non-coding parts of any gene - the introns of a gene are sequences of DNA that
don't codify proteins. Their function is not totally clear, but they can be transcripted into different RNAs, and
these regions can also regulate genes expression. There are 14 exons and 13 introns.

[20]
20.11.2017

References:

http://www.uniprot.org/uniprot/Q45KI0#sequences

https://www.ncbi.nlm.nih.gov/nuccore/NM_002769.4?report=genbank

http://www.uniprot.org/uniprot/P00762#

http://www.uniprot.org/uniprot/B5XEC7#sequences

https://www.ncbi.nlm.nih.gov/nuccore/NM_012635.1?report=genbank

https://www.ncbi.nlm.nih.gov/nuccore/NM_001126232.1?report=genbank

https://www.ncbi.nlm.nih.gov/nucest/EX473359.1?report=genbank

https://www.ncbi.nlm.nih.gov/nuccore/NM_000505.3

[21]

Вам также может понравиться