Академический Документы
Профессиональный Документы
Культура Документы
Dr Avril Coghlan
alc@sanger.ac.uk
Window-size = 10,
Threshold = 5
Human Eyeless
Sequence alignment
• A second method for comparing sequences is a
sequence alignment
• An alignment is an arrangement in columns of 2
sequences, highlighting their similarity
The sequences are padded with gaps (dashes) so that wherever
possible, alignment columns contain identical letters from the two
sequences involved
An insertion or deletion is represented by ‘–’ (a gap)
The symbol “|” is used to represent matches
eg. here is an alignment for amino acid sequences
“QKGSYPVRSTC” & “QKGSGPVRSTC”:
Q K G S Y P V R S T C This
Therealignment
are1 10
is mismatch
matches
has
| | | | | | | | | |
Q K G S G P V R S T C 11 columns
1 2 3 4 5 6 7 8 9 10 11
Sequence alignment
• An alignment of the human and fruitfly
(Drosophila melanogaster) Eyeless proteins:
What does an alignment mean?
• An alignment is tells you tells you what mutations
occurred in the sequences since the sequences
shared a common ancestor
eg. an alignment of the human & fruitfly Eyeless suggests:
(i) there were probably deletion(s) at the start of the human
Eyeless, or insertion(s) at the start of fruitfly Eyeless
Q K G S Y P V R S T C sequence 1
| | | | | | | | | |
Q K G S G P V R S T C sequence 2
The alignment implies that one mutation occurred since the two
sequences shared a common ancestor
That is, the alignment implies there was a G→Y substitution in
sequence 1 or a Y→G substitution in sequence 2
Problem
• Are there other possible plausible alignments for
sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
Answer
• Are there other possible plausible alignments for
sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
There are many other possible alignments, eg. :
Q K G S Y - P V R S T C
| | | | | | | | |
Q K G - S G P V R S T C
Q K G S - Y P V R S T C
| | | | | | | | |
Q K G S G P - V R S T C
Q K G - - - - - S Y P V R S T C
| | | | | |
Q K G S G P V R S - - - - - T C
Q K - G S Y P V R S T C
| | |
Q K G S G P V R S T - C etc. etc. etc. . . .
Number of possible pairwise alignments
• There are lots of different possible alignments for
two sequences that are both of length n
The number of possible alignments of 2 seqs of length n letters (amino
acids/nucleotides) is ( ) (“2n2nchoose n”)
n
2n
( n) can be calculated as ( 2n
n ) = (2*n) !
n! * n!
where n! (‘n factorial’) = n * (n - 1) * (n – 2) * (n – 3) * ... * 3 * 2 * 1
• For example, for “QKGSYPVRSTC” &
“QKGSGPVRSTC”, n (length) = 11 letters
The number of possible alignments of these two sequences is
(2*11) = ( 22 ) = (2*11) ! = 22!
11 11
11! * 11! 39916800*3991680