Вы находитесь на странице: 1из 23

NCEMBU S.

(201414932)

BCH 511 FINAL ASSIGNMENT

SECTION A

1.1

Pepsinogen-nuc.txt

>J04443.1 Homo sapiens pepsinogen C (PGC) mRNA, complete cds

CAGCATCATGAAGTGGATGGTGGTGGTCTTGGTCTGCCTCCAGCTCTTGGAGGCA
GCAGTGGTCAAAGTGCCCCTGAAGAAATTTAAGTCTATCCGTGAGACCATGAAGG
AGAAGGGCTTGCTGGGGGAGTTCCTGAGGACCCACAAGTATGATCCTGCTTGGAA
GTACCGCTTTGGTGACCTCAGCGTGACCTACGAGCCCATGGCCTACATGGATGCTG
CCTACTTTGGTGAGATCAGCATCGGGACTCCACCCCAGAACTTCCTGGTCCTTTTT
GACACCGGCTCCTCCAACTTGTGGGTGCCCTCTGTCTACTGCCAGAGCCAGGCCT
GCACCAGTCACTCCCGCTTCAACCCCAGCGAGTCGTCCACCTACTCCACCAATGG
GCAGACCTTCTCCCTGCAGTATGGCAGTGGCAGCCTCACCGGCTTCTTTGGCTATG
ACACCCTGACTGTCCAGAGCATCCAGGTCCCCAACCAGGAGTTCGGCTTGAGTGA
GAATGAGCCTGGTACCAACTTCGTCTATGCGCAGTTTGATGGCATCATGGGCCTGG
CCTACCCTGCTCTGTCCGTGGATGAGGCCACCACAGCTATGCAGGGCATGGTGCA
GGAGGGCGCCCTCACCAGCCCCGTCTTCAGCGTCTACCTCAGCAACCAGCAGGGC
TCCAGCGGGGGAGCGGTTGTCTTTGGGGGTGTGGATAGCAGCCTGTACACGGGGC
AGATCTACTGGGCGCCTGTCACCCAGGAACTCTACTGGCAGATTGGCATTGAAGA
GTTCCTCATCGGCGGCCAGGCCTCCGGCTGGTGTTCTGAGGGTTGCCAGGCCATC
GTGGACACAGGCACCTCTCTGCTCACTGTGCCCCAGCAGTACATGAGTGCTCTTC
TGCAGGCCACAGGGGCCCAGGAGGATGAGTATGGACAGTTTCTCGTGAACTGTAA
CAGCATTCAGAATCTGCCCAGCTTGACCTTCATCATCAATGGTGTGGAGTTCCCTC
TGCCACCTTCCTCCTATATCCTCAGTAACAACGGCTACTGCACCGTGGGAGTCGAG
CCCACCTACCTGTCCTCCCAGAACGGCCAGCCCCTGTGGATCCTCGGGGATGTCTT
CCTCAGGTCCTACTATTCCGTCTACGACTTGGGCAACAACAGAGTAGGCTTTGCCA
CTGCCGCCTAGACTTGCTGCCTCGACACGTGGGTGGGCTCCCCTCTTCCTCTTGAC
CCTGCACCCTCCTAGGGCATTGTATCTGTCTTTCCACTCTGGATTCAGCCTTCTTTT
TCTGGACTCTGGACTTTCTCTAATAATAAATAGTTCTTCTTT

HIV1-Prt.txt

>CAA09316.1 HIV-1 protease, partial [Human immunodeficiency virus]

PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVR
QYDQILIEICGHKAIGTVLIGPTPVNIIGRNLLTQIGCTLNF

1.2

The authors that published the sequence for Pepsinogen are: Taggart, R.T., Cass, L.G.,
Mohandas,T.K., Derby,P., Barr,P.J., Pals,G. and Bell,G.I.

Journal: JOURNAL J. Biol. Chem. 264 (1), 375-379 (1989).

The sequence was published in 1989.

1.3

PepProtSeq.txt

MKWMVVVLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYR
FGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHS
RFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
YAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLSNQQGSSGGAVVFG
GVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQ
QYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLPPSSYILSNNGYCTV
GVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA

To obtain the above pepsinogen protein sequence I went to ncbi (National Center for
Biotechnology Information) site, https://www.ncbi.nlm.nih.gov/. I searched using the
accession number, J04443 of pepsinogen. I opened the nucleotide sequence via FASTA
format. I copied the nucleotide sequence and pasted it on Gene Runner. Gene Runner is an
easy-to-use software tool that permits the analysis of nucleotide sequence and thereafter
provides protein sequence. On Gene Runner I pasted the nucleotide sequence and opened
reading frame which after clicking search allowed me to look at the open reading frame
(ORF). I examined the ORF and chose +2 since +2 had a high molecular weight (size) than
all other frames.

1.4

The EC number for the HIV-1 protease is 3.4.23.16

3: HIV-1 protease is as hydrolases (enzymes that use water to break up certain


molecule).

3.4: Is a hydrolases that act on peptide bonds.

3.4.23: Is an aspartic endopeptidase.

3.4.23.16: HIV-1 retropepsin.

1.5

Substrates that have been used to measure the HIV-1 protease activity are:

2-aminobenzoyl-TI-Nle-Phe(NO2)-ER

The optimal pH ranges from 4.5 to 5.2

2-aminobenzoyl-TI-Nle-Phe(NO2)-QR-NH2

The optimal pH is 6

1.6

I went to the ncibi site, www.ncbi.nlm.nih.gov, obtained protein sequence of pepsinogen by


clicking protein id, on the right hand side I clicked BLAST and thereafter clicked
protein blast. I then selected the top five pepsinogen protein sequences.

PepTopFive.prt

>XP_003833357.1 PREDICTED: gastricsin [Pan paniscus]

MKWMVVVLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYR
FGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHS
RFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
YAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLSNQQGSSGGAVVFG
GVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQ
QYMSALLEATGAQEDEYGQFLVNCNSIQNLPTLTFIINGVEFPLPPSSYILSNDGYCTV
GVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA

>XP_518465.2 PREDICTED: gastricsin [Pan troglodytes]

MKWMVVVLVCLQLSEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYR
FGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHS
RFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
YAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLSNQQGSSGGAVVFG
GVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQ
QYMSALLEATGAQEDEYGQFLVNCNSIQNLPTLTFIINGVEFPLPPSSYILSNDGYCTV
GVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA

>XP_004044046.2 PREDICTED: gastricsin [Gorilla gorilla gorilla]

MKWMVMVLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYH
FGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHS
RFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
CAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLSNQQGSSGGAVVFG
GVDNSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQ
QYMSALLQATGAQEDEYGQFLVNCNSIQNLPTLTFIINGVEFPLPPSSYILSNNGYCTV
GVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA

>XP_017733759.1 PREDICTED: gastricsin [Rhinopithecus bieti]

MKWLVVVLLCLQLLEAAVLKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF
GDLSVSYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHSR
FNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVY
AQFDGIMGLAYPALSVDGATTAMQGMVQEGALTSPIFSVYLSDQQGSSGGAVVFGGV
DSSLYTGQIYWAPVTRELYWQIGIEEFLIGGQASSWCSEGCQAIVDTGTSLLTVPQQY
MSALLQATGAQEDEYGEFLVDCNSIQNLPTLTFIINGVEFPLPPSSYILSNNGYCTVGV
EPTYLSSQNGQPLWILGDVFLRSYYSVYDLSNNRVGFATAA

>XP_010367940.1 PREDICTED: gastricsin [Rhinopithecus roxellana]

MKWLVVVLLCLQLLEAAVLKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF
GDLSVSYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHSR
FNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVY
AQFDGIMGLAYPALSVDGATTAMQGMVQEGALTSPIFSVYLSDQQGSSGGAVVFGGV
DSSLYTGQIYWAPVTRELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQQY
MSALLQATGAQEDEYGEFLVDCNSIQNLPTLTFIINGVEFPLPPSSYILSNNGYCTVGV
EPTYLSSQNGQPLWILGDVFLRSYYSVYDLSNNRVGFATAV

I went to to the ncibi site, www.ncbi.nlm.nih.gov, and opened protein sequence, on the right
hand side I clicked BLAST and thereafter clicked protein blast. I then selected the top
five HIV-1 protease protein sequences.

HIVTopFive.prt.

>NP_705926.1 retropepsin [Human immunodeficiency virus 1]

PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVR
QYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF

>CAB66012.1 V-1 protease, partial [Human immunodeficiency virus 1]

PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVR
QYDQILVEICGHKAIGTVLIGPTPVNIIGRNLLTQIGCTLNF

>pdb|1W5Y|A Chain A, Hiv-1 Protease in Complex with Fluoro Substituted Diol- Based C2-
Symmetric Inhibitor
ADRQGTVSFNFPQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKM
IGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF

>pdb|1BV9|A Chain A, Hiv-1 Protease (I84v) Complexed with Xv638 Of Dupont


Pharmaceuticals

PQVTLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVR
QYDQILIEICGHKAIGTVLVGPTPVNVIGRNLLTQIGCTLNF

>AAC61538.1 protease, partial [Human immunodeficiency virus 1]

PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQ
YDQILIEICGHKAIGTVLIGPTPVNIIGRNLLTQIGCTLNF

1.7

PepEvolProt.txt.

>AAA60074.1 pepsinogen [Homo sapiens]


MKWMVVVLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYR
FGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHS
RFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
YAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLSNQQGSSGGAVVFG
GVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQ
QYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLPPSSYILSNNGYCTV
GVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA

>XP_006148646.1 PREDICTED: gastricsin [Tupaia chinensis]


MKWMVVALVCLQLLEASVVKVSLKKGKSIRDTMKEKGLLKEFLRTHKYDPAQKYH
FNDFSVAYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTNHP
RFNPSQSSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
YAQFDGIMGMAYPALSMGGATTALQGMLQEGVLTSPVFSFYLSNQQGSEDGGAVIFG
GVDNSLYSGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSQGCQAIVDTGTSLLTVPQ
QYMSTLLQATGAQEDEYGQFLVNCDNIQSLPTFTFIINGVQFPLPPSAYILSNNGACM
VGVEATYLPSQNGQPLWILGDVFLRSYYSVYDMSNNRVGFATAA

>BAB11755.1 pepsinogen C [Rhinolophus ferrumequinum]


MKWMVVVLLCLQLLEAKVVKVPLKKLKSLRETMKEKGLLEEFLKNHKYDPAQKYR
YTDFSVAYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQTQACTGHT
RFNPSQSSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFV
YAQFDGIMGMAYPSLAMGGATTALQGMLQEGALTSPVFSFYLSNQQGSQNGGAVIF
GGVDNSLYQGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSQGCQAIVDTGT
SLLTVPQQYMSALLQATGAQEDQYGQFFVNCNYIQNLPTFTFIINGVQFPLPPSSYILN
NNGYCTVGVEPTYLPSQNGQPLWILGDVFLRSYYSVYDMGNNRVGFATAA

>XP_006190704.1 PREDICTED: gastricsin [Camelus ferus]


MKWMVVALGCLQLLEATLIRVPLKKFKSVRETMKEKGLLEEFLRTHKYDPVQKYRF
SDFSVVSEPMAYMDASYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHTR
FNPSLSSTYSTNGQAFSLQYGSGSLTGFFGYDTLMVHDIKVPNQEFGLSENEPGTNFL
YATFDGIMGMAYPALSTDGATTALQGMLQEGALTCPVFSFYLSSQQSSQDGGAIVFG
GVDNSLYTGQIHWTPVTQELYWQIGIEEFLIGDQTSGWCSQGCQAIVDTGTSLLTVPQ
QFMSALLQATGAQEDQYRQFLVDCNNTQSLPTFTFIISGVQFPLPPSSYILNNDEGYCT
MGVEVTYLPSQNGQPLWILGDVFLRSYYSVYDISNNRVGFATAA

>XP_005896510.1 PREDICTED: gastricsin [Bos mutus]


MKWMVLALVCLQALEAAALVKIPLKKFKSIREIMKEKGLLEDFLRTYKHDPAQKYR
FGDFIVATEPMDYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHT
RFNHSLSSTYSTNEQTFSLQYGSGSLTGILGYDTLTVQGIKVPNQEFGLSKTEPGTNFL
YAKFDGIMGMAYPSLSVDGATTVLQGMLQEGALTSPFSFYLSSQQGSQDGGAVIFGG
VDSCLYTGQIYWAPVTQELYWQIGFEEFLIGDQATGWCSTGCQAIVDTGTSLLTVPQQ
FLSALLQATGAQEDQYGQFPVDCNNIQNLPTLTFVINGVQFPLPPASYILNNDDSYCIL
GVEVTYVPSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAV

>XP_007085267.1 PREDICTED: gastricsin [Panthera tigris altaica]


MKWMVVALASLQLLEAAVVKVPLSKRESIWETMKEKGLLGEFLRNPKHDPVQKYH
FGNLDAVYEPLAFLDSLYLGEISIGTPPQNFLVLFDTGSSSLWVPSVHCQSQACAGHSH
FNSNASSTYSSNGQIFSVRYGSGGLRGIYGYDTLRVQSIQVPNQQFGLSELEPSPYFFH
AKFDGIMGLAFPSLAEGRTTTSLQGMLRAGVLSSPVFSFYLGRQMNPQKGAVLIFGGI
DHSLHRGPIYWAPVTQERYWQIGFEEFLIGGHATGWCSQGCEAIVDTGTSLLTVPQQ
YLSYLLQATGAQADQYGQFMVDCNNVQSLPTLTFLINRVQFSLPYSSYLFRGNDICAI
RVQATYLPSSSGQPLWILGDVFLRSYYSIFDIGNKRVGFAVAA

>AAA49530.1 pepsinogen [Rana catesbeiana]


MKFLILALVCLQLSEGIIKVPLKKFKSMREVMRDHGIKAPVVDPATKYYNNFATAFEP
LANYMDMSYYGEISIGTPPQNFLVLFDTGSSNLWVPSTYCQSQACTNHPQFNPSQSSS
YSSNQQQFSLQYGTGSLTGILGYDTVQIQNIAISQQEFGLSVTEPGTNFVYAQFDGILG
LAYPSIAEGGATTVMQGMIQQNLINQPLFAFYLSGQQNSQNGGEVAFGGVDQNYYSG
QIYWTPVTSETYWQIGIQGFSVNGQATGWCSQGCQGIVDTGTSLLTAPQSVFSSLMQS
IGAQQDQNGQYAVSCSNIQSLPTISFTISGVSFPLPPSAYVLQQNSGYCTIGIMPTYLPS
QNGQPLWILGDVFLRQYYSVYDLGNNQVGFAAAA

>XP_007947479.1 PREDICTED: gastricsin [Orycteropus afer afer]


MRILVLVLVCLHLSEGLERVILKKGKSIRQVMEERGVLEEFLKSHPKVDPATKYQFSSE
AVAYEPITNYLDSFYFGEISIGTPPQNFLVLFDTGSSNLWVPSTYCQSQACSNHNRFNPS
LSSTFRNNGQTYTLSYGSGSLSVVLGYDTVTVQNIVVNNQEFGLSENEPSNPFYYSNF
DGILGMAYPNMAVGNAPTVMQGMLQQGQLTQPIFSFYFSRQPTYQYGGELILGGVD
TQLYSGQIIWTPVTRELYWQIGIQEFAIGNQATGLCSQGCQAIVDTGTFLLAVPEQYM
GSFLQATGAQQAQNGDFVVNCNYIQSMPTITFVISGAQFPLPPSAYVFNNNGYCTLGI
EATYLPSPTGQPLWILGDVFLKEYYSVYDMANNRVGFALSA

>XP_009916480.1 PREDICTED: gastricsin [Haliaeetus albicilla]


MKWLILALVCLQLSEGLVRIKLQKAKSIREKMKEAGVLEDFLKKIKYDPAKKYHFSE
DYVVYEPMTSHLDSSYFGNISMGTPPQDFLVLFDTGSSNLWVPSCQTPACSDHARFN
PSESSTFTSNGQKYASGSLAVVLGYDTLTLQSIAVSNQEFGLSESEPTEPFYYADFDGI
MGMGYPSLAVGGTPTALQGMLQQNQLTQPIFSFYFSRQPTYDYGGELILGGIDTQLFS
GDIVWAPVTQELYWQVAIDEFAIGQSATGWCSQGCQAIVDTGTYLLTVPQQYIGVFA
QALGAQPTNEGYAVDCSETQNMPTITFVINGAQLPLSPSAYVSNNNGYCTLAIEETYL
PSQNGQPLWILGDIFLKEYYTIFDMGNNNIGFASSV
>XP_006278836.1 PREDICTED: gastricsin [Alligator mississippiensis]
MKWLILALVCLQLSEGLVRVPLRKGKSMRQAMKEKGVLEDFLKHHKYDPGRKYHL
NELNVAHEPMTNYLNSFYFGEISIGTPPQTFLVVFDTGSSNLWVPSTYCQDKACTNHA
KFNPNASSTYSSIGESYTLCYGVGDLVLLGYDTVTVQNIIVRNQEFGLSVDEPTDPFY
YSNYDGVLGMAYPGVAIPGFKTLMQNMVQQDQLSEPIFSFYFSRNPTYQYGGEVILG
GIDSQLFTGQITWAPVIEEVYWKIALDEFSIGKNNTSWCSQGCHAIVDTGTFLLTIPYQ
YLEDFLNAVGAQESYGYYVVDCSSIQNMPTLTFVINGVKFPLPPSVYVFNDNGSCSVA
VEETYVASESGHPLWILGDVFLRQYYSVFDMANNRVGFALSN

>XP_010005368.1 PREDICTED: gastricsin [Chaetura pelagica]


MKWLVLALMCLQLSEGLVRIKLKKGKSIRENMREAGVLEDYLKKIKNDPAKKYNFT
DNYVVFEPLTNHLDASYFGEISIGDPPQNFLVLFDTGSSNLWVPSSYCQTPACFNHAK
FNPNDSSTFISSGLSYTLSYGSGAVTVLLGYDTLRIQSIMVTNQEFGLSQDEPTQPFYY
AEFDGILGMAYPLLAVGSIPTALQGMMQQNQLTEPIFSFYFSRQPTYNYGGELILGGID
TRLFRGDIIWAPVTQELYWQVELGGFAIGESTTGWCRQGCQAXXXXXXLTVPQEYL
DSFLQAVGAQLTSYGYAVDCNEIQNLPTITFIINGVSLPLYPSAYILKNKGYCTVGVEAT
YLPSQNGQPLWIFGDVFLKEYYTVFDMANNRVGFATSV

I pasted the accession number of pepsinogen to the ncbi site, via Blast I selected protein
sequences from various organisms that were different in terms of evolution. I then obtained
sequences that were similar in terms of length and these protein sequences (above) had M
(Methionine) as a starting amino acid.

1.8 CLUSTAL O(1.2.4) multiple sequence alignment

XP_006278836.1 MKWLILALVCLQLSE--GLVRVPLRKGKSMRQAMKEKGVLEDFLKH-HKYDPGRKYHLNE
XP_007947479.1 MRILVLVLVCLHLSE--GLERVILKKGKSIRQVMEERGVLEEFLKSHPKVDPATKYQFSS
XP_009916480.1 MKWLILALVCLQLSE--GLVRIKLQKAKSIREKMKEAGVLEDFLKK-IKYDPAKKYHFSE
XP_010005368.1 MKWLVLALMCLQLSE--GLVRIKLKKGKSIRENMREAGVLEDYLKK-IKNDPAKKYNFTD
AAA49530.1 MKFLILALVCLQLSE--GIIKVPLKKFKSMREVMRDHGIKA------PVVDPATKY-YNN
XP_007085267.1 MKWMVVALASLQLLEA-AVVKVPLSKRESIWETMKEKGLLGEFLRN-PKHDPVQKYHFGN
XP_005896510.1 MKWMVLALVCLQALEAAALVKIPLKKFKSIREIMKEKGLLEDFLRT-YKHDPAQKYRFGD
XP_006190704.1 MKWMVVALGCLQLLEA-TLIRVPLKKFKSVRETMKEKGLLEEFLRT-HKYDPVQKYRFSD
AAA60074.1 MKWMVVVLVCLQLLEA-AVVKVPLKKFKSIRETMKEKGLLGEFLRT-HKYDPAWKYRFGD
XP_006148646.1 MKWMVVALVCLQLLEA-SVVKVSLKKGKSIRDTMKEKGLLKEFLRT-HKYDPAQKYHFND
BAB11755.1 MKWMVVVLLCLQLLEA-KVVKVPLKKLKSLRETMKEKGLLEEFLKN-HKYDPAQKYRYTD
*: :::.* .*: * : :: * * :*: : *.: *: ** ** .

XP_006278836.1 LNVAHEPMTNYLNSFYFGEISIGTPPQTFLVVFDTGSSNLWVPSTYCQDKACTNHAKFNP
XP_007947479.1 EAVAYEPITNYLDSFYFGEISIGTPPQNFLVLFDTGSSNLWVPSTYCQSQACSNHNRFNP
XP_009916480.1 DYVVYEPMTSHLDSSYFGNISMGTPPQDFLVLFDTGSSNLWVPS--CQTPACSDHARFNP
XP_010005368.1 NYVVFEPLTNHLDASYFGEISIGDPPQNFLVLFDTGSSNLWVPSSYCQTPACFNHAKFNP
AAA49530.1 FATAFEPLANYMDMSYYGEISIGTPPQNFLVLFDTGSSNLWVPSTYCQSQACTNHPQFNP
XP_007085267.1 LDAVYEPL-AFLDSLYLGEISIGTPPQNFLVLFDTGSSSLWVPSVHCQSQACAGHSHFNS
XP_005896510.1 FIVATEPM-DYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHTRFNH
XP_006190704.1 FSVVSEPM-AYMDASYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHTRFNP
AAA60074.1 LSVTYEPM-AYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHSRFNP
XP_006148646.1 FSVAYEPM-AYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTNHPRFNP
BAB11755.1 FSVAYEPM-AYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQTQACTGHTRFNP
.. **: .:: * *:**:* *** ***:******.***** ** ** .* :**

XP_006278836.1 NASSTYSSIGESYTLCYGVGDLYVLLGYDTVTVQNIIVRNQEFGLSVDEPTDPFYYSNYD
XP_007947479.1 SLSSTFRNNGQTYTLSYGSGSLSVVLGYDTVTVQNIVVNNQEFGLSENEPSNPFYYSNFD
XP_009916480.1 SESSTFTSNGQK----YASGSLAVVLGYDTLTLQSIAVSNQEFGLSESEPTEPFYYADFD
XP_010005368.1 NDSSTFISSGLSYTLSYGSGAVTVLLGYDTLRIQSIMVTNQEFGLSQDEPTQPFYYAEFD
AAA49530.1 SQSSSYSSNQQQFSLQYGTGSLTGILGYDTVQIQNIAISQQEFGLSVTEPGTNFVYAQFD
XP_007085267.1 NASSTYSSNGQIFSVRYGSGGLRGIYGYDTLRVQSIQVPNQQFGLSELEPSPYFFHAKFD
XP_005896510.1 SLSSTYSTNEQTFSLQYGSGSLTGILGYDTLTVQGIKVPNQEFGLSKTEPGTNFLYAKFD
XP_006190704.1 SLSSTYSTNGQAFSLQYGSGSLTGFFGYDTLMVHDIKVPNQEFGLSENEPGTNFLYATFD
AAA60074.1 SESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVYAQFD
XP_006148646.1 SQSSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVYAQFD
BAB11755.1 SQSSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVYAQFD
. **:: . *. * : . ****: ::.* : :*:**** ** * :: :*

XP_006278836.1 GVLGMAYPGVAIPGFKTLMQNMVQQDQLSEPIFSFYFSRNPTYQYGGEVILGGIDSQLFT
XP_007947479.1 GILGMAYPNMAVGNAPTVMQGMLQQGQLTQPIFSFYFSRQPTYQYGGELILGGVDTQLYS
XP_009916480.1 GIMGMGYPSLAVGGTPTALQGMLQQNQLTQPIFSFYFSRQPTYDYGGELILGGIDTQLFS
XP_010005368.1 GILGMAYPLLAVGSIPTALQGMMQQNQLTEPIFSFYFSRQPTYNYGGELILGGIDTRLFR
AAA49530.1 GILGLAYPSIAEGGATTVMQGMIQQNLINQPLFAFYLSGQQNSQNGGEVAFGGVDQNYYS
XP_007085267.1 GIMGLAFPSLAEGRTTTSLQGMLRAGVLSSPVFSFYLGRQMNPQKGAVLIFGGIDHSLHR
XP_005896510.1 GIMGMAYPSLSVDGATTVLQGMLQEGALTSPVFSFYLSSQQGSQDGGAVIFGGVDSCLYT
XP_006190704.1 GIMGMAYPALSTDGATTALQGMLQEGALTCPVFSFYLSSQQSSQDGGAIVFGGVDNSLYT
AAA60074.1 GIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLSNQQGSS-GGAVVFGGVDSSLYT
XP_006148646.1 GIMGMAYPALSMGGATTALQGMLQEGVLTSPVFSFYLSNQQGSEDGGAVIFGGVDNSLYS
BAB11755.1 GIMGMAYPSLAMGGATTALQGMLQEGALTSPVFSFYLSNQQGSQNGGAVIFGGVDNSLYQ
*::*:.:* :: * :*.*:: . :. *:*:.*:. : . *. : :**:* .

XP_006278836.1 GQITWAPVIEEVYWKIALDEFSIGKNNTSWCSQGCHAIVDTGTFLLTIPYQYLEDFLNAV
XP_007947479.1 GQIIWTPVTRELYWQIGIQEFAIGNQATGLCSQGCQAIVDTGTFLLAVPEQYMGSFLQAT
XP_009916480.1 GDIVWAPVTQELYWQVAIDEFAIGQSATGWCSQGCQAIVDTGTYLLTVPQQYIGVFAQAL
XP_010005368.1 GDIIWAPVTQELYWQVELGGFAIGESTTGWCRQGCQAXXX--XXXLTVPQEYLDSFLQAV
AAA49530.1 GQIYWTPVTSETYWQIGIQGFSVNGQATGWCSQGCQGIVDTGTSLLTAPQSVFSSLMQSI
XP_007085267.1 GPIYWAPVTQERYWQIGFEEFLIGGHATGWCSQGCEAIVDTGTSLLTVPQQYLSYLLQAT
XP_005896510.1 GQIYWAPVTQELYWQIGFEEFLIGDQATGWCSTGCQAIVDTGTSLLTVPQQFLSALLQAT
XP_006190704.1 GQIHWTPVTQELYWQIGIEEFLIGDQTSGWCSQGCQAIVDTGTSLLTVPQQFMSALLQAT
AAA60074.1 GQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQQYMSALLQAT
XP_006148646.1 GQIYWAPVTQELYWQIGIEEFLIGGQASGWCSQGCQAIVDTGTSLLTVPQQYMSTLLQAT
BAB11755.1 GQIYWAPVTQELYWQIGIEEFLIGGQASGWCSQGCQAIVDTGTSLLTVPQQYMSALLQAT
* * *:** * **:: : * :. :. * **.. *: * . : : ::

XP_006278836.1 GAQESY-GYYVVDCSSIQNMPTLTFVINGVKFPLPPSVYVFND-NGSCSVAVEETYVASE
XP_007947479.1 GAQQAQNGDFVVNCNYIQSMPTITFVISGAQFPLPPSAYVFNN-NGYCTLGIEATYLPSP
XP_009916480.1 GAQPTN-EGYAVDCSETQNMPTITFVINGAQLPLSPSAYVSNN-NGYCTLAIEETYLPSQ
XP_010005368.1 GAQLTS-YGYAVDCNEIQNLPTITFIINGVSLPLYPSAYILKN-KGYCTVGVEATYLPSQ
AAA49530.1 GAQQDQNGQYAVSCSNIQSLPTISFTISGVSFPLPPSAYVLQQNSGYCTIGIMPTYLPSQ
XP_007085267.1 GAQADQYGQFMVDCNNVQSLPTLTFLINRVQFSLPYSSYLFRG-NDICAIRVQATYLPSS
XP_005896510.1 GAQEDQYGQFPVDCNNIQNLPTLTFVINGVQFPLPPASYILNNDDSYCILGVEVTYVPSQ
XP_006190704.1 GAQEDQYRQFLVDCNNTQSLPTFTFIISGVQFPLPPSSYILNNDEGYCTMGVEVTYLPSQ
AAA60074.1 GAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLPPSSYILSN-NGYCTVGVEPTYLSSQ
XP_006148646.1 GAQEDEYGQFLVNCDNIQSLPTFTFIINGVQFPLPPSAYILSN-NGACMVGVEATYLPSQ
BAB11755.1 GAQEDQYGQFFVNCNYIQNLPTFTFIINGVQFPLPPSSYILNN-NGYCTVGVEPTYLPSQ
*** : *.*. *.:*:::* *. ..: * : *: .. * : : **: *

XP_006278836.1 SGHPLWILGDVFLRQYYSVFDMANNRVGFALSN
XP_007947479.1 TGQPLWILGDVFLKEYYSVYDMANNRVGFALSA
XP_009916480.1 NGQPLWILGDIFLKEYYTIFDMGNNNIGFASSV
XP_010005368.1 NGQPLWIFGDVFLKEYYTVFDMANNRVGFATSV
AAA49530.1 NGQPLWILGDVFLRQYYSVYDLGNNQVGFAAAA
XP_007085267.1 SGQPLWILGDVFLRSYYSIFDIGNKRVGFAVAA
XP_005896510.1 NGQPLWILGDVFLRSYYSVYDLGNNRVGFATAV
XP_006190704.1 NGQPLWILGDVFLRSYYSVYDISNNRVGFATAA
AAA60074.1 NGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA
XP_006148646.1 NGQPLWILGDVFLRSYYSVYDMSNNRVGFATAA
BAB11755.1 NGQPLWILGDVFLRSYYSVYDMGNNRVGFATAA
.*:****:**:**:.**:::*:.*:.:*** :

I went to the ebi site, CLUSTAL W (1.83) multiple sequence alignment, using Default
settings and then clicked Clustal Omega. I pasted protein sequence of eleven different
organisms and clicked submit so as to perform a global alignment of each set of protein
sequences. The scoring matrix used is BLOSUM62.

1.9

19.1

Amino acids in the signal peptide

Met-1

Lys-2

Trp-3

Met-4

Val-5

Val-6

Val-7

Ile-8

Val-9

Cys-10

Leu-11

Gln-12

Ile-13

Ile-14
Glu-15

Ala-16

Amino acids in the catalytic triad:

Asp-91

Thr-92

Gly-93

Amino acids in the substrate binding sites:

Pro-06

Ile-08

Glu-09

Glu-10

Val-11

Asp-12

The website I used to get catalytic triad of amino acids involved in the signal peptide, active
site and substrate binding site is the ncbi, website: https://www.ncbi.nlm.nih.gov/. I first
changed the nucleotide sequence of pepsinogen to protein sequence I did this by clicking
protein id then clicked site from the information displayed pertaining pepsinogen based
on the signal pepide, active site and "substrate binding sites [chemical binding]".

1.9.2 [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]-{GQ}-
[LIVMFSTNC]-{EGK}-[LIVMFGTA]
1.10 Amino acids involved in the catalytic triad:

Asp-25

Thr-26

Gly-27

Amino acids involved in the Active site flap:

Met-46

Ile-47

Gly-48

Gly-49

Ile-50

Gly-52

Phe-53

Ile-54

Lys-55

Val-56

Amino acids involved in the inhibitor binding site:

Asp-25

Gly-27

Asp-29

Met-46

Ile-47

Gly-48

Ile-84
The Prosite signature pattern for Eukaryotic and viral aspartyl proteases active sites:

[LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]-{GQ}-
[LIVMFSTNC]-{EGK}-[LIVMFGTA]

1.11
To obtain the above evolutionary tree, I went to ClustalW through the website,
http://www.genome.jp/tools/clustalw/. I then pasted the thirteen protein sequences and
clicked execute multiple sequence allignment and thereafter saved the evolutionary tree as
pdf file. I used Clustal W because it is a software package that is a general purpose multiple
sequence alignment program for both DNA and protein sequences. Clusta W provides
biologically meaningful multiple sequence alignments of various divergent sequences and
moreover Clustal W calculates the best match based on selected sequences, and ultimately
lines them in such a way that identities, similarities and differences can be easily identified.

1.12

Both HIV-1 protease and pepsinogen belong to the family of aspartic proteases and both
have been shown to have a high level of structural homology (Scand, 1992). Pepsinogen is
the precurssor (zymogen ) of pepsin and its primary structure has an additional 44 amino
acids than its active form, pepsin . Pepsinogen is an aspartic protease and thus uses catalytic
aspartate in its active site. In the stomach, chief cells release pepsinogen which is activated by
the gastric hydrochloric acid (HCl), released from parietal cells in the stomach lining. Pepsin
digests of up to 20% of ingested amide bonds by cleaving preferentially at the N-terminal
side of aromatic amino acids such as phenylalanine, tryptophan, and tyrosine (Cox et al.,
2008). The specificity of pepsinogen for protein substrates is directed to the hydrophobic
binding sites.

Conversely, the active site of the HIV-1 protease includes six amino acids, which is catalytic
triad of Asp-25, Thr-26 and Gly-27 found in each monomer in amino acid positions 25 to 27
and 25' to 27'. Up to now, the role of Thr26 and Thr26', and Gly27 and Gly27', is still under
study. Nevertheless, it is hypothesized that strong hydrogen-bonding forces between the
Thr26 and Thr26' residues function to stabilize the conformational state of the active site, and
that the function of Gly27 and Gly27' is there to enhance the conformational changes towards
binding a substrate in such a way that Asp25 and Asp25' carboxylate groups can attack the
amide moiety of a substrate. HIV-1 protease possesses two flaps in the active dimer as
opposed to the single flap in non-viral aspartic protease. An extended beta-sheet region on
the monomers, flap, is involved in the substrate binding site with the two aspartyl residues
lying on the bottom of a hydrophobic cavity (Mimoto et al., 2000).

On the other hand , in pepsinogen there is a direct H-bonding between the flap and the
substrate, and in HIV protease, the carbonyl group between P1 and P2 and the carbonyl
between P and P2 both make hydrogen bonds to the same water molecule which forms
another two hydrogen bonds to the flaps to produce approximately tetrahedral geometry. The
presence of this water molecule has been observed in most of the crystal structures of HIV-1
protease bound to different inhibitors (Gustchina and Weber, 1990).

SECTION B

Question 1

TPR1001.pdb
TPR1002.pdb

TRP1003.pdb
QUESTION 2

TPR2A001.pdb

TRP2A002.pdb.
Question 3

Comp1&2A.pdb.
Question 4

TPR1mut.pdb.

Mutation in TPR1 motif of Lys-8 or Asn-12 did not affect binding of TPR1 domain to the
Hsp70, however double mutation (Lys 8 to Ala) of these residues abrogated binding. This
implies that TPR1-Hsp70 interaction requires a network of interactions not only between
charged residues in the TPR1 domain and the motif of Hsp70. Hence, the observed
abrogation of binding to Hsp70 might have occurred mainly because of loss of important
electrostatic contacts with the TPR domain and secondarily as a result of disruption of
contacts with other domains necessary for utter interaction with Hsp70.

Binding of Hop to Hsp90 occur because of the unoccupied pair of electrons on the N atom of
the amino (NH2) groups of lysine (Lys). Nevertheless when Lys is mutated to an alanine
binding of Hop to Hsp90 is discontinued because alanine (Ala) does not contain amino group
(NH2) in the side chain which permits hydrogen bonding and thereby allowing binding of
Hop.
TPR2Amut.pdb

Binding of Hop to Hsp90 occur because of the unoccupied pair of electrons on the N atom of
the amino (NH2) groups of lysine (Lys). Nevertheless When Lys is mutated to an alanine
(Ala) binding of Hop to Hsp90 is discontinued because alanine (Ala) does not contain amino
group (NH2) in the side chain which permits hydrogen bonding and thereby allowing binding
of hop to the Hsp 90. Also the electrostatic interaction between Hop and Hsp 90 is lost when
Lys is mutated to Ala and thus binding is abrogated.

Question 5

The two major heat shock protein (Hsp) chaperones Hsp70 and Hsp90 both bind the co-
chaperone Hsp70/Hsp90 organizing protein (Hop). Hop plays a key role in coordinating Hsp
actions in folding protein substrates. Hop contains three tetratricopeptide repeat (TPR)
domains that have binding sites for the conserved EEVD C termini of Hsp70 and Hsp90.
Based on crystallographic studies, it have been shown that EEVD interacts with positively
charged amino acids in Hop TPR-binding pockets (called carboxylate clamps), and point
mutations of these carboxylate clamp positions have a nascent of disrupting Hsp binding.
Hop renders an essential interaction between Hsp70 and Hsp90 as it can simultaneously bind
both chaperones (hsp70/ hsp90) and have shown to efficiently targets Hsp90 and direct it to
pre-existing Hsp70client complexes (Chen and Smith, 1998). The modified (protein with
amino acid mutated) TPR1 and TPR2A had a discontinued binding to Hsp70 and Hsp90
respectively. Hop interact with Hsp70 and Hsp90 via their N-terminal and first central
tetratricopeptide repeat (TPR) domains (Lassle et al., 1997).

TPR motif is a degenerate 34-amino acid sequence that has been incorporated by a large
number of co-chaperones that interact especially with Hsp90 (Nicolet and Craig, 1989). The
crystal structure of the TPR1 domain of Hop in complex with C-terminal Hsp70 heptapeptide
hypothesize that certain basic residues protruding into the TPR groove are needed for its
interaction with Hsp70. Again, the structure suggests that hydrophobic contacts are critical in
determining specificity of binding of Hop to Hsp70. As with many other Hsp70- or Hsp90-
binding co-chaperones, TPR domains of Hop mediate Hsp binding (Smith, 2004).

The core TPR domain is composed of six total -helices that produce a saddle-like structure.
The surface of the TPR domain resembles a concave structure which renders an interaction
site that can aid in the specific peptide binding (Scheufler et al., 2000). Thus, TPR ligand-
binding pockets appear to influence Hop global conformation in the absence of bound Hsp70
or Hsp90, these findings affects all TPR domains. According to Odunuga et al., (2003) point
mutations in TPR1 can change Hop conformation.
References

Chen, S. and Smith, D.F. (1998). Hop as an adaptor in the heat shock protein 70
(Hsp70) and hsp90 J. Biol. Chem. 272:18761884.
Cox M, Nelson DR, Lehninger AL (2008). Lehninger principles of biochemistry. San
Francisco: W.H. Freeman.
Gustchina, A., and I. T. Weber, FEBS Lett., 1990, 269, 269.
Lassle M., Blatch G. L., Kundra V., Takatori T., Zetter B. R. (1997) chaperone
machinery. J. Biol. Chem. 273: 3519435200. [PubMed].
Mimoto, T., Hattori, N., Takaku, H. (2000) Structure-activity relationship of orally
potent tripeptide-based HIV protease inhibitors containing hydroxymethylcarbonyl
isostere. Chemical & Pharmaceutical Bulletin. 48(9); 1310-1326.
Nicolet C. M., Craig E. A. (1989). Mol. Cell. Biol. 9:36383646.
Odunuga, O.O., Hornby, J.A., Bies, C., Zimmermann, R., Pugh, D.J., and Blatch, G.L.
(2003). Tetratricopeptide repeat motif-mediated Hsc70- mSTI1 interaction. Molecular
characterization of the critical contacts for successful binding and specificity. J. Biol.
Chem. 278: 68966904. [PubMed].
Scand. J. Clin. Lab. Invest., 52 (Suppl.) (1992), pp. 3950.
Scheufler, C., Brinker, A., Bourenkov, G., Pegoraro, S., Moroder, L., Bartunik, H.,
Hartl, F.U., and Moarefi, I. (2000). Structure of TPR domainpeptide complexes:
Critical elements in the assembly of the Hsp70Hsp90 multichaperone machine. Cell
101: 199210. [PubMed].
Smith, D.F. (2004). Tetratricopeptide repeat cochaperones in steroid receptor
complexes. Cell Stress Chaperones 9: 109121. [PMC free article] [PubMed]

Вам также может понравиться