BINSFinalt

MS(CS) Final Thesis
Muhammad Rehan 1358-MSCS-08 Department of Computer Science GC University Lahore

Background Literature Review Problem Statement Hypothesis Methodology Result Future Work References
Bioinformatics
Any application of computation in biology
including data management, developing algorithm and data mining. Bioinformatics is the field of science in which computer science, information technology, statistics and various branches of biology merge to from single discipline.
What are Proteins

Enzymes
Antibody
Proteins
Hormones
Structural Support & Transportation
What are Proteins made of

R R R Amino group H2N C H Carboxyl group COOH Alpha carbon H CH3
General formula of Amino Acid
Sr. No 1 2 3 4 5 6 7 . . 20
Amino Acid
Single Letter Code A R S T Y G H . . L
Three Letter Code Ala Arg Ser Thr Tyr Gly His . . Ieu
Alanine Arginine Serine Threonine Tyrosine Glycine Histidine . . Ieucine
Experimental Approach
vivo vitro silico
Post Translational Modification(PTMs)

Protein modification is very important for
biological activity and perform the desire task. This modification is done by the addition of phosphate, glycosyl or other groups to certain amino acids.
PTM Phosphorylation Glycosylation Sulfation Acetylation Methylation R
Target Amino Acid S, T, Y, H S, T, N Y
Description Addition of a phosphate group, to S, T, Y, H Addition of a glycosyl group to either S, T, N. Addition of a sulfate group to a Y Addition of an acetyl group, usually at the N-terminus of the protein Addition of a methyl group, usually at or R residues
List of PTMs Types
Database Name PROSITE
Description Reference Database of consensus patterns for Sigrist et al., (2002) various PTMs Human protein reference database of Peri et al., (2003) disease-related proteins and their PTMs Database with collection of Garabelli (2003) annotations and structures for PTMs Database of phosphorylation sites. validated Diella et al., (2004)
HPRD
RESID
PhosphoBase ELM
List of PTMs Databases
Sr. No
1
Statement
Proteins often perform diverse and multiple functions. The diversity of proteome is higher complex then genome, in human genome the number of genes are 22,000 to 25,000 but in contrast number of proteins more than 10,0000 To identify the proteins functions and events mainly rely on their particular 3-D structure as well as the occurrence of targeted amino acid modification. PTMs regulate various functions of proteins by effecting verificational changes such as enzymes activation. Phosphorylated serine, theronine and tyrosine residues using MS is not easy in VIVO.
References
(Jeffery ,1999)
(Nicolle H et al ,2007)
(Attwood ,2000)
(Konstantinopoulos et al ,2007)
(Mann et al ,2002)
Sr. No
6
Statement
Many methods have been developed within the field of proteomics but these methods are still in early stages.
References
(Blom ,2004)
Application of machine learning and statistics in bioinformatics have always played a core role in understanding proteomics and to analysis of PTMs. ANN is one such Approach that has been extensively used in biological sequences analysis.
(Qazi et al,2006)
(Wu, C.H ,1997)
Mostly cellular proteins are regulated by reversible phosphorylation and at least 30% of protein have such alteration.
(Ficarro et al ,2002)

DISPHOS PredPhosPho GPS PPSP KinasePhos 1.0, KinasePhos 2.0 NetPhos, NetPhosK Neural-genetic
Tools Method Serine Threonine Tyrosine
DISPHOS Logistic regression 76% 81% 83%
NetPhos ANN 69% 72% 61%
Neuralgenetic ANN 75% 82% 79%
BPNN ANN 72% 77% 74%
Tools Method Kinase PKA Kinase PKC
KinasePhos 2.0 SVM Sn=92% Sp=89% Acc=90% Sn=84% Sp=86% Acc=85%
KinasePhos PredPhospho 1.0 HMM Sn=91% Sp=86% Acc=85% Sn=80% Sp=87% Acc=83% SVM Sn=88% Sp=%91 Ac=90% Sn=79% Sp=86% Ac=83%
GPS GPS Sn=91% Sp=89% Acc=90% Sn=82% Sp=83% Acc=82%
PPSP BDT Sn=90% Sp=92% Acc=91% Sn=82% Sp=86% Acc=84%
Develop a new method BINS to evolve new

classification model by learning amino acid sequences data using machine learning based method artificial neural network. This BINS improve the prediction specificity, efficiency and accuracy for machine learning simulator called GEARS (Genetic Evaluation of Classifier by Learning Residue Rules and Sequences).
BINS classification method will reduce the
false negative and positive prediction. BINS method show highly accuracy prediction about PTMs which will affect the specific site and kinases that act at each site, disclose the important biologically information from noisy data. BINS method can gives the best result as compare to the existing PTMs prediction methods.
Empirical research methodology with

Exploratory Development Life Cycle will be used for the development of BINS Model.
BINS consists mainly on three parts

BINS Data Preparation Module BINS Bootstrapping Module BINS ANN Module
BINS Data Preparation Module Create Protein grouped by target classes
PTMs Database Removed of duplicate instance

Create Protein Database grouped by non modified target classes Peptide Generator
BINS Bootstrapping Module Peptide dataset grouped by non modified classes Peptide dataset grouped by non modified classes BINS ANN Module Topology and Network Configuration Sparse Encoding Merge the Sparse Encoding dataset grouped by modified and non modified target classes Training [SN] [SP] [Acc] [MCC] Validation Training and validation Dataset Generator Validation dataset Generator Training dataset Generator [SN] [SP] [Acc] [MCC]
BINS Data Preparation Module

BINS Database Inconsistency Analyzing Utility BINS Balance Inverted Site Application BINS Peptide Extraction Application

PID O08539 O14543 O14746 O14920 O15117 Sequences Position Amino Acid S T S Y S Modification S T S Y S ASTSMNSY 4 TLKSYA. MVTHSKFP 3 AAGS. MPRAPRC RAVSTA MSWYPSL TQTC. 11 4
ELSFKQGE 3 QIYTA.

Target No. of No. of sites Proteins positive Peptide No. of Negative Peptide No. of Balance negative Peptide
14837 2983 2325
No. of merge pos and balance neg pep

29304 5890 4533
S T Y
5431 1940 1156
14467 2907 2208
326396 35795 16273

BINS Database Inconsistency Analyzing Utility
PID O08539 O14543 O14746 O14920 O15117 Sequences Position Amino Acid S T S Y S Modification S T S Y S Length 350 1030 1250 735 952 ASTSMNSY 4 TLKSYA. MVTHSKFP AAGS. MPRAPRC RAVSTA 3 11
MSWYPSLT 4 QTC. ELSFKQGE QIYTA. 3

BINS Invert Application
PID O08539 O08539 O08539 Sequences Position Amino Acid S S S Modification S S S Length 350 350 350 ASTSMNSY 2 TLKSYA. ASTSMNSY 7 TLKSYA. ASTSMNSY 12 TLKSYA.

BINS Peptide Extraction Application
Peptide ID Extend ed Seque nces Class P-10 P-9 P-8 P0 P9 P10
O08539 -2
-,-,0.1 ,A,S,T, S,M,N S,Y,T,L K,S -,-,0.1 ,A,S,T, S,M,N, S,Y,T,L ,K,S,Y A,-,-,
O08539 -7
BINS Bootstrapping Module

BINS Training Dataset Encoding Manager BINS Data Table Merging Utility BINS Boot Strapping Application

Sparse Encoding Scheme
Amino Acid
A C D E F G H I
Coding Scheme
10000000000000000000 01000000000000000000 00100000000000000000 00010000000000000000 00001000000000000000 00000100000000000000 00000010000000000000 00000001000000000000
. .
-
. .
00000000000000000000

BINS Training Dataset Encoding Manager
Peptide ID Extend ed Seque nces Class P-10- P1 10-2 P10-3 P108 P10-9 P1010
O08539 -2
O08539 -7

BINS DataTable Merging Utility
Peptide ID Extend ed Seque nces Class P-10- P1 10-2 P10-3 P108 P10-9 P1010
O08539 -2
O08539 -7

BINS Boot Strapping Application
BINS ANN Module
Evaluation Strategy
Sn=TP/(TP+FN) Sp=TN/(TN+FP) Acc=(Sn+Sp)/2 MCC=
Evaluation Strategy
PID Sequence Position Target Clarify
O3265 O3265 O3265 O3265 O3265
SASNSTSYTS SASNSTSYTS SASNSTSYTS SASNSTSYTS SASNSTSYTS
3 10 1 5 7
Mod Mod
TP FN
Non-mod TN Non-mod FP Non-mod TN
BINS Serine Result

Sr. No 1 2 3 4 5 6 Training
Ac 0.965 0.984 0.996 0.995 0.998 0.996 Sn 1 0.982 0.996 0.995 0.998 0.996 Sp 0.931 0.987 0.996 0.996 0.998 0.996 MCC 0.932 0.969 0.992 0.991 0.996 0.992 Ac 0.497 0.805 0.807 0.807 0.807 0.809
Validation
Sn 0 0.612 0.619 0.622 0.616 0.628 Sp 1 0.999 0.995 0.995 0.999 0.991 MCC None 0.662 0.663 0.664 0.665 0.663
BINS Threonine Result

Sr. No 1 2 3 4 5 6 7 8 9 10 Training
Ac 0.972 0.987 0.987 0.987 0.987 0.986 0.988 0.989 0.990 0.990 Sn 0.963 0.986 0.986 0.986 0.986 0.986 0.990 0.989 0.990 0.989 Sp 0.981 0.989 0.988 0.987 0.987 0.986 0.987 0.990 0.990 0.991 MCC 0.946 0.975 0.974 0.974 0.974 0.972 0.977 0.979 0.980 0.980 Ac 0.826 0.834 0.825 0.827 0.824 0.822 0.825 0.823 0.822 0.821
Validation
Sn 0.688 0.737 0.750 0.771 0.774 0.772 0.761 0.768 0.770 0.770 Sp 0.965 0.932 0.901 0.884 0.875 0.872 0.890 0.880 0.874 0.871 MCC 0.680 0.683 0.658 0.659 0.653 0.648 0.657 0.652 0.647 0.645
BINS Tyrosine Result

Sr. No 1 2 3 4 5 6 7 8 9 10 Training
Ac 0.966 0.972 0.977 0.976 0.975 0.973 0.974 0.974 0.974 0.977 Sn 0.952 0.961 0.975 0.975 0.973 0.970 0.971 0.973 0.974 0.974 Sp 0.979 0.983 0.979 0.977 0.976 0.976 0.977 0.975 0.975 0.980 MCC 0.933 0.945 0.955 0.953 0.950 0.947 0.948 0.948 0.949 0.955 Ac 0.846 0.843 0.836 0.837 0.831 0.829 0.828 0.828 0.826 0.825
Validation
Sn 0.735 0.741 0.778 0.780 0.779 0.779 0.778 0.779 0.778 0.768 Sp 0.951 0.939 0.891 0.890 0.881 0.877 0.876 0.875 0.872 0.879 MCC 0.705 0.697 0.675 0.676 0.665 0.661 0.659 0.659 0.654 0.652
BINS Comparison with other Method

Algorithm
Acc
BINS NetPhos DISPHOS BPNN Neural-genetic
Y
Sn 74% 70% NA 75% 81% Sp 95% 68% NA 75% 78% Acc 83% 72% 81% 78% 83%
T
Sn 74% 66% NA 78% 81% Sp 93% 77% NA 77% 84% Acc 81% 69% 76% 72% 75% Sn
S
Sp 99% 57% NA 72% 74%
85% 69% 83% 75% 79%
63% 81% NA 72% 76%
BINS is a developed as Desktop Application, technically, there is no online WWW support available in the current version, nevertheless, increasing opportunities over the internet urges the need to develop an online version of this application for its wider scope and availability to multiple clients in different regions of the world. This effort would not only help us to enhance the embedded capability of BINS for efficient PTMs but also could be major resource for multi-nation research collaborations. BINS are the sub module of GEARS so in next version learn and optimize the parameters and weights of ANN with genetic algorithm. In next, BINS integrate with other GEARS modules like MAPRes and HMM for best classification of proteins data using pros and cons of each technique.
Jeffery C.J. Moonlighting proteins, Trends Biochem. Sci., 24:8-
11, 1999. Bork P., Dansekar T., Diaz-Lazcoz Y., Eisenhaber F., Huynen M. and Yuan Y. Predicting function: from genes to genome and back. J. Mol. Biol., 283:707--725, 1998. Attwood T. The quest to deduce protein function from sequence: the role of pattern databases, Int. J. Biochem. Cell Biol., 32:139-155, 2000. Mann, M., Ong, S., Gronborg. M, .Steen, H. et al., Trends Biotechnol. 2002, 20, 261-268. Wu, C. H., Comput, Chem, 1997, 21, 237-256. Blom N., Sicheritz-Protein T., Gupta R., Gammeltoft S., and Brunak S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence, Proteomics, 4: 1633--1649, 2004.

BINSFinalt

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

BINSFinalt

Загружено:

Авторское право:

Доступные форматы

MS(CS) Final Thesis

Muhammad Rehan 1358-MSCS-08 Department of Computer Science GC University Lahore

What are Proteins

Structural Support & Transportation

What are Proteins made of

General formula of Amino Acid

Single Letter Code A R S T Y G H . . L

Alanine Arginine Serine Threonine Tyrosine Glycine Histidine . . Ieucine

 Post Translational Modification(PTMs)

PTM Phosphorylation Glycosylation Sulfation Acetylation Methylation R

Target Amino Acid S, T, Y, H S, T, N Y

List of PTMs Types

Database Name PROSITE

List of PTMs Databases

(Wu, C.H ,1997)

Tools Method Serine Threonine Tyrosine

DISPHOS Logistic regression 76% 81% 83%

NetPhos ANN 69% 72% 61%

Neuralgenetic ANN 75% 82% 79%

BPNN ANN 72% 77% 74%

Tools Method Kinase PKA Kinase PKC

KinasePhos 2.0 SVM Sn=92% Sp=89% Acc=90% Sn=84% Sp=86% Acc=85%

GPS GPS Sn=91% Sp=89% Acc=90% Sn=82% Sp=83% Acc=82%

PPSP BDT Sn=90% Sp=92% Acc=91% Sn=82% Sp=86% Acc=84%

 Develop a new method BINS to evolve new

 BINS classification method will reduce the

 Empirical research methodology with

 BINS consists mainly on three parts

BINS Data Preparation Module Create Protein grouped by target classes

PTMs Database Removed of duplicate instance

 BINS Data Preparation Module

 BINS Data Preparation Module

 BINS Data Preparation Module

No. of merge pos and balance neg pep

5431 1940 1156

14467 2907 2208

326396 35795 16273

 BINS Data Preparation Module

MSWYPSLT 4 QTC. ELSFKQGE QIYTA. 3

 BINS Data Preparation Module

 BINS Data Preparation Module

 BINS Bootstrapping Module

 BINS Bootstrapping Module

 BINS Bootstrapping Module

 BINS Bootstrapping Module

 BINS Bootstrapping Module

 BINS ANN Module

O3265 O3265 O3265 O3265 O3265

SASNSTSYTS SASNSTSYTS SASNSTSYTS SASNSTSYTS SASNSTSYTS

Non-mod TN Non-mod FP Non-mod TN

BINS Serine Result

BINS Threonine Result

BINS Tyrosine Result

BINS Comparison with other Method

85% 69% 83% 75% 79%

63% 81% NA 72% 76%

 Jeffery C.J. Moonlighting proteins, Trends Biochem. Sci., 24:8-    

Вам также может понравиться

Post Translational Modification(PTMs)

Develop a new method BINS to evolve new

BINS classification method will reduce the

Empirical research methodology with

BINS consists mainly on three parts

BINS Data Preparation Module

BINS Data Preparation Module

BINS Data Preparation Module

BINS Data Preparation Module

BINS Data Preparation Module

BINS Data Preparation Module

BINS Bootstrapping Module

BINS Bootstrapping Module

BINS Bootstrapping Module

BINS Bootstrapping Module

BINS Bootstrapping Module

BINS ANN Module

Jeffery C.J. Moonlighting proteins, Trends Biochem. Sci., 24:8-