Вы находитесь на странице: 1из 4

paper topics

Parallelization of star alignment


Parallel H4MSA for multiple sequence alignment
A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids
Parallelized genomic sequencing model: a big data approach for bioinformatics ap
plication
Massively parallel algorithm for multiple biological sequences alignment
MRSMRS: Mining Retitive Sequences in a MapReduce Setting
Alignment-Free Sequence Comparison over Hadoop for Computational Biology
A parallel algorithm for the best K mismatches alignment problem
HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Sta
r Strategy
Progressive alignment method using genetic algorithm for multiple sequence align
ment
Implementation of parallel protein structure alignment service on cloud
An Enhanced Framework of Genomics Using Big Data Computing
A private cloud system for web based high performance multiple sequence alignmen
t services
Genomic analysis with mapreduce
Blast parallel: the parallelizing implementation of sequence alignment algorithm
s based on hadoop platform
Pairwise sequence alignment method for distributed shared memory systems
A steady state genetic algorithm for multiple sequence alignment
Mapreduce based parallel suffix tree construction for human genome
Phylogenetic analysis using mapreduce programming model
A Novel Structure of the Smith-Waterman Algorithm for Efficient Sequence Alignme
nt
Bwasw- cloud: Efficient sequence alignment algorithm for big data with mapreduce
Parallel A-Star Multiple Sequence Alignment with Locality-Sensitive Hash Functio
ns
Optimizing Sequence Alignment in Cloud using Hadoop and MPP Database
Parallelization of BLAST with MapReduce for Long Sequence Alignment
An algorithm of multiple sequence alignment based on consensus sequence searched
by simulated annealing and star alignment
Big data: cloud computing in genomics applications

thesis topics

Parallelization of star alignment algorithm for multiple sequence alignment usin


g hadoop data grids
Parallelization of star alignment algorithm for multiple sequence alignment usin
g MapReduce Model
A MapReduce Model of star alignment algorithm for multiple sequence alignmnet.

Introduction
Computer science is applicable across many domains in order to solve the computa
tional problem such as in biological
sciences. It can be used to perform pair-wise sequence
alignment. The pairwaise sequence alignment technique aims
to identify regions of similarity between two DNA sequences
to analyze the functional, structural, or evolutionary
relationships between sequences. One method for pair-wise
sequence alignment is Needleman-Wunsch that defines the
way of finding the best global sequence of two sequences using
dynamic programming. This method has the complexity of
O(n2). In this method, the problem is devided into smaller parts
and use smaller solution to build a larger solution.
Sequence alignment can be used to determine the function of
genes and proteins by comparing the similarities of the
sequences in all that was studied. Currently, there are many
tools and techniques that provide analysis of sequence
alignment and alignment products to understand molecular
biology such as predicting secondary protein via the use of
multiple sequence alignment.
In this research, we proposed of using Star Alignment
method to perform multiple sequence alignment. In the Star
Alignment method, we compare sequences S1-Sk, S2-Sk, S3-Sk,
, Sk-1-Sk one by one in which they will be pair-wise
alignment. In simple terms, the complexity of Star alignment
algorithm is quite high which is O(k2 n2). To reduce the execution time signific
antly, we requires to modify the Star
Alignment algorithm by implementing parallel programming using Map Reduce model
of Hadoop.

Problem definition
Pair-wise sequence alignment is a technique of comparing the similarity of two o
rganisms. It is the basic technique in DNA sequence alignment.
There is an extraordinary number of data sequences when they are compared. Probl
ems when comparing the huge data sequences are accuracy and
efficiency. These parameters are contradiction which means reaching faster speed
will decrease accuracy, and vice versa.
Older method such as Needleman-Wunchs has extremely high computational complexit
y of O(n2). Similarly, multiple sequence
alignment that processes the sequences one by one, called Star Alignment, takes
time until O(k2n2). Therefore, they have a timing
issue problem while processing the data. However, the computation result still h
as high accuracy. Consequently, it is very
important to get a better way to improve the performance. One of the methods for
increasing the speed is parallel computation by
using multiple computers work together as a system.
This research focuses on finding a faster method to process multiple sequence al
ignment using the Star method with a parallel computer.
Our proposed method is implemented using Message Passing Interface (MPI). The re
sults show that the paralellization of the Star Alignment
increased speed up 4-6 times compared to that of using single CPU.

paralellization methodology of Map and Reduce framework.


scope of work

the main applications of sequence alignments have included phylogenetic tree rec
onstruction, A phylogenetic tree or evolutionary tree
is a branching diagram or "tree" showing the inferred evolutionary relationships
among various biological species or other entities their
phylogeny based upon similarities and differences in their physical or genetic chara
cteristics. The taxa joined together in the tree are
implied to have descended from a common ancestor.
1- Structure Prediction: a Multiple Sequence Alignment can give you the almost p
erfect protein or RNA secondary structure, some times it
helps even with the 3D structure.
2- Protein Family: a Multiple Sequence Alignment can help you to decide that you
r protein is a member of a known protein family or not.
3- Pattern Identification: By looking at conserved regions or sites, you can ide
ntify which region is responsible for a functional site.
4- Domain Identification: By looking at file provided by a Multiple Sequence Ali
gnment, you can extract profiles to use them against databases.
5- DNA Regulatory Elements: You can use Multiple Sequence Alignments to locate D
NA regulatory elements such as binding sites...etc.
6- Phylogenetic Analysis: By carefully picking related sequences you can reconst
ruct a tree using sequences that u have used in the Multiple
Sequence Alignment.
As Multiple Sequence Alignments are playing a major role inBioinformatics, you c
an use it almost anywhere but as every thing on this earth,
nothing is perfect or 100% accurate, so u have to choose your sequences very car
efully to prevent meaningless results.

The scope of this thesis work is to develop a MapReduce model of star alignment
algorithm for multiple sequence alignment. The main applications
of sequence alignments have included phylogenetic tree reconstruction, Protein f
amily prediction, pattern identification.

Вам также может понравиться