Вы находитесь на странице: 1из 108

s Contents

Page no.

Preface

Chapter 1: Introduction

Chapter 2: Material and methods

Chapter 3: Result and discussion

Chapter4: Conclusion

Chapter 5: Bibliography and references

Chapter 6: Abbreviation

Appendix
Preface

Viruses are masters of interspecies navigation. Mutating rapidly and often grabbing the genetic material of other
viruses, they can jump from animals to humans with a quick flick of their DNA. Sometimes, as in West Nile
fever, the transfer occurs through an intermediate host such as a mosquito. But viruses can also make the leap
directly.
Since the 1980s, the list of diseases that have hitchhiked directly from animals to people has grown
rapidly — Hantavirus, SARS, monkey pox and, most recently, avian influenza, commonly called bird flu. With
the exception of HIV/AIDS, perhaps none of these illnesses has more potential to create widespread harm than
bird flu does.
In people, bird flu usually begins much like conventional influenza, with fever, cough, sore throat and
muscle aches, but bird flu can lead to life-threatening complications.
So far, bird flu is hard for humans to contract, but health officials warn a major flu outbreak could occur if the
virus mutates into a form that can spread easily from person to person. The grimmest scenario would be a global
epidemic to rival the flu pandemic of 1918 and 1919, which claimed millions of lives worldwide. In the
meantime, researchers are trying to sort out options for a vaccine. Bird flu seems to be developing resistance to
the flu drug Tami flu. And a French vaccine maker has produced a bird flu vaccine that promoted an immune
system response but still needs further study.
Lots of work has been going on in bird flu virus .So I also decide to do something in this topic. So I
perform the phylogenetic analysis of different strains of Influenza A virus.
Our current work aimed at analyzing the phylogenetic relationship between the different strains of
Influenza A virus and analyzing the cause of virulence in the light of evolution. We compare the evolutionary
position of different strains in the phylogenetic trees, taking five different types of strains which come under either
HPAI or LPAI. This study throws some light on evolutionary relationship between different strains of Influenza A
virus for better understanding of the evolution of pathogenesis in terms of antigenic drift and shift. Also to model
the unknown protein structure of influenza A virus and to find appropriate drug target for it.
The Gene sequences were collected from NCBI site for the strain of Influenza A virus for the purpose of
phylogenetic analysis. we got nearly 40 sequences .the sequences are aligned using CLUSTAL W package to
know their similarity and relationships..using the output of the CLUSTAL W as the input for the PHYLIP we
perform phylogenetic analysis .Then we use N-J plot for visualizing the tree constructed by the PHYLIP.
After that I collect the protein sequences from NCBI website of H5N1 strain of Influenza A virus.I got
nearly 10 such sequences In order to proceed the modelling we take protein sequences of h5n1 strain only. For it
we use SWISS MODEL server.
Then we go for docking in order to proceed the structure prediction analysis .we find that the
Neuraminidase protein in influenza A virus coded by the NA gene is one of the reasons of its pathogenecity.We
find out appropriate ligand from PDBSUM or CSA, .then we perform docking through HEX software for docking.
We also perform Family analysis through GENSCAN, ORF FINDER for finding consense sequence, motif, exon,
introns & orfs etc.

2
Chapter: 1
INTRODUCTION

Introduction
Avian Influenza viruses that infect bird are called avian influenza A viruses only influenza A viruses
infect birds and all known subtypes of influenza a viruses can infect birds. However there are substantial genetic
differences between the subtype that typically infect both people and birds. Avian influenza A, h5 and h9 viruses
can be distinguished as low pathogenic and high pathogenic forms on the basis of genetic features of the virus.

Influenza A virus, the virus that causes avian flu. Transmission electron micrograph of negatively stained
virus particles in late passage. (Source: Dr. Erskine Palmer, Centers for Disease Control and Prevention
Public Health Image Library)

Avian influenza is a disease of birds caused by influenza viruses closely related to human influenza viruses.
Transmission to humans in close contact with poultry or other birds occurs rarely and only with some strains of
avian influenza. The potential for transformation of avian influenza into a form that both causes severe disease in
humans and spreads easily from person to person is a great concern for world health

Wild birds are the natural host for all known subtypes of influenza A viruses. Typically, wild birds do not become
sick when they are infected with avian influenza A viruses. However, domestic poultry, such as turkeys and
chickens, can become very sick and die from avian influenza, and some avian influenza A viruses also can cause
serious disease and death in wild birds.

Influenza Virus Types, Subtypes, and Strains

Three distinct types of influenza virus, dubbed A, B, and C, have been identified. Together these viruses, which
are antigenically distinct from one another, comprise their own viral family, Orthomyxoviridae. Most cases of
the flu, especially those that occur in epidemics or pandemics, are caused by the influenza A virus, which can

3
affect a variety of animal species, but the B virus, which normally is only found in humans, is responsible for
many localized outbreaks. The influenza C virus is morphologically and genetically different than the other two
viruses and is generally no symptomatic, so is of little medical concern
Influenza Type A
Influenza type A viruses can infect people, birds, pigs, horses, seals, whales, and other animals, but wild birds are
the natural hosts for these viruses. Influenza type A viruses are divided into subtypes based on two proteins on the
surface of the virus. These proteins are called hemagglutinin (HA) and neuraminidase (NA). There are 15
different HA subtypes and 9 different NA subtypes. Many different combinations of HA and NA proteins are
possible. Only some influenza A subtypes (i.e., H1N1, H1N2, and H3N2) are currently in general circulation
among people. Other subtypes are found most commonly in other animal species. For example, H7N7 and H3N8
viruses cause illness in horses and dogs.
Subtypes of influenza A virus are named according to their HA and NA surface proteins. For example, an “H7N2
virus” designates influenza A subtype that has an HA 7 protein and an NA 2 protein. Similarly an “H5N1” virus
has an HA 5 protein and an NA 1 protein.

Influenza Type B
Influenza B viruses are normally found only in humans. Unlike influenza A viruses, these viruses are not
classified according to subtype. Although influenza type B viruses can cause human epidemics, they have not
caused pandemics.

Influenza Type C
Influenza type C viruses cause mild illness in humans and do not cause epidemics or pandemics. These viruses are
not classified according to subtype.

Strains
Influenza B viruses and subtypes of influenza A virus are further characterized into strains. There are many
different strains of influenza B viruses and of influenza A subtypes. New strains of influenza viruses appear and
replace older strains. This process occurs through a type of change is called “drift” (see How Influenza Viruses
Can Change: Shift and Drift). When a new strain of human influenza virus emerges, antibody protection that may
have developed after infection or vaccination with an older strain may not provide protection against the new
strain. Thus, the influenza vaccine is updated on a yearly basis to keep up with the changes in influenza viruses.

Subtypes
Influenza A viruses are significant for their potential for disease and death in humans and other animals. Influenza
A virus subtypes that have been confirmed in humans, in order of the number of known human pandemic deaths
that they have caused, include:

• H1N1, which caused "Spanish Flu" and currently causes seasonal human flu

4
• H2N2, which caused "Asian Flu"
• H3N2, which caused "Hong Kong Flu" and currently causes seasonal human flu
• H5N1, the world's major current pandemic threat
• H9N2, which has infected three people

Human influenza virus verses avian influenza virus


Humans can be infected with influenza types A, B, and C. However, the only subtypes of influenza A virus that
normally infect people are influenza A subtypes H1N1, H1N2, and H3N2. Between 1957 and 1968, H2N2 viruses
also circulated among people, but currently do not.

Only influenza A viruses infect birds. Wild birds are the natural host for all subtypes of influenza A virus.
Typically wild birds do not get sick when they are infected with influenza virus. However, domestic poultry, such
as turkeys and chickens, can get very sick and die from avian influenza, and some avian viruses also can cause
serious disease and death in wild birds.

Structure

The structure of the influenza virus (see Figure 1) is somewhat variable, but the virion particles are usually
spherical or ovoid in shape and 80 to 120 nanometres in diameter. Sometimes filamentous forms of the virus
occur as well, and are more common among some influenza strains than others. The influenza virion is an
enveloped virus that derives its lipid bilayer from the plasma membrane of a host cell. Two different varieties of
glycoprotein spike are embedded in the envelope. Approximately 80 percent of the spikes are hemagglutinin, a
trimeric protein that functions in the attachment of the virus to a host cell. The remaining 20 percent or so of the
glycoprotein spikes consist of neuraminidase, which is thought to be predominantly involved in facilitating the
release of newly produced virus particles from the host cell. On the inner side of the envelope that surrounds an
influenza virion is an antigenic matrix protein lining. Within the envelope is the influenza genome, which is
organized into eight pieces of single-stranded RNA (A and B forms only; influenza C has 7 RNA segments). The
RNA is packaged with nucleoprotein into a helical ribonucleoprotein form, with three polymerase peptides for
each RNA segment.
5
Diagrammatic representation of the morphology of an influenza virion.

The virion is generally rounded but may be long and filamentous.


A single-stranded RNA genome is closely associated with a helical nucleoprotein (NP), and is present in eight
separate segments of ribonucleoprotein (RNP), each of which has to be present for
successful replication. The segmented genome is enclosed
within an outer lipoprotein envelope. An antigenic protein
called the matrix protein (MP 1) lines the inside of the
envelope and and is chemically bound to the RNP. The
envelope carries two types of protruding spikes. One is a box-
shaped protein, called the neuraminidase (NA), of which there
are nine major antigenic types, and which has enzymic
properties as the name implies. The other type of envelope
spike is a trimeric protein called the hemagglutinin (HA) (illustrated on the right) of
which there are 13 major antigenic types. The hemagglutinin functions during attachment of the virus particle to
the cell membrane, and can combine with specific receptors on a variety of cells including red blood cells. The
lipoprotein envelope makes the virion rather labile - susceptible to heat, drying, detergents and solvents.

Genes of Influenza A virus


Influenza A viruses have 10 genes on eight separate RNA molecules (called: PB2,
PB1, PA, HA, NP, NA, M, and NS). HA, NA, and M specify the structure of proteins that are most medically
relevant as targets for antiviral drugs and antibodies. (An eleventh recently discovered gene called PB1-F2
sometimes creates a protein but is absent from some influenza virus isolates.) this segmentation of the influenza
genome facilitates genetic recombination by segment reassortment in hosts who are infected with two different
influenza viruses at the same time. Influenza A virus is the only species in the Influenzavirus A genus of the
Orthomyxoviridae family and are negative sense, single-stranded, segmented RNA viruses.

Surface encoding gene segments

• Surface antigen encoding gene segments (RNA molecule): (HA, NA)


o HA codes for hemagglutinin, which is an antigenic glycoprotein, found on the surface of the
influenza viruses and is responsible for binding the virus to the cell that is being infected.

o NA codes for neuraminidase, which is an antigenic glycoprotein enzyme, found on the surface
of the influenza viruses. It helps the release of progeny viruses from infected cells.

Internal encoding gene segments

• Internal viral protein encoding gene segments (RNA molecule): (M, NP, NS, PA, PB1, PB2)

Matrix encoding gene segments:

6
o M codes for the matrix proteins (M1 and M2) that along with the two surface proteins
(hemagglutinin and neuraminidase) make up the capsid (protective coat) of the virus. It encodes
by using different reading frames from the same RNA segment.
 M1 is a protein that binds to the viral RNA.
 M2 is a protein that uncoats the virus exposing its contents (the eight RNA segments)
to the cytoplasm of the host cell. The M2 transmembrane protein is an ion channel
required for efficient infection. Nucleoprotein encoding gene segments

o NP codes for nucleoprotein.


o NS: NS codes for two nonstructural proteins (NS1 and NEP). "[T]he pathogenicity of influenza
virus was related to the nonstructural (NS) gene of the H5N1/97 virus"

Polymerase encoding gene segments

o PA codes for the PA protein which is a critical component of the viral polymerase.
o PB1 codes for the PB1 protein and the PB1-F2 protein.
o PB2 codes for the PB2 protein which is a critical component of the viral polymerase.

How the Flu Virus Can Change - "Drift" and "Shift"

Influenza viruses can change in two different ways. One is called "antigenic drift." These are small changes in
the virus that happen continually over time. Antigenic drift produces new virus strains that may not be
recognized by the body's immune system. This process works as follows: a person infected with a particular flu
virus strain develops antibody against that virus. As newer virus strains appear, the antibodies against the older
strains no longer recognize the "newer" virus, and reinfection can occur. This is one of the main reasons why
people can get the flu more than one time. In most years, one or two of the three virus strains in the influenza
vaccine are updated to keep up with the changes in the circulating flu viruses. So, people who want to be
protected from flu need to get a flu shot every year.

The other type of change is called "antigenic shift." Antigenic shift is an abrupt, major change in the influenza A
viruses, resulting in new hemagglutinin and/or new hemagglutinin and neuraminidase proteins in influenza
viruses that infect humans. Shift results in a new influenza A subtype. When shift happens, most people have
little or no protection against the new virus. While influenza viruses are changing by antigenic drift all the time,
antigenic shift happens only occasionally. Type A viruses undergo both kinds of changes; influenza type B
viruses change only by the more gradual process of antigenic drift.

Influenza viruses are dynamic and are continuously evolving. Influenza viruses can change in two different
ways: antigenic drift and antigenic shift. Influenza viruses are changing by antigenic drift all the time, but
antigenic shift happens only occasionally. Influenza type A viruses undergo both kinds of changes; influenza
type B viruses change only by the more gradual process of antigenic drift.

7
Genetic of drifting And shifting:

Antigenic drift refers to small, gradual changes that occur through point mutations in the two genes that contain
the genetic material to produce the main surface proteins, hemagglutinin, and neuraminidase. These point
mutations occur unpredictably and result in minor changes to these surface proteins. Antigenic drift produces
new virus strains that may not be recognized by antibodies to earlier influenza strains. This process works as
follows: a person infected with a particular influenza virus strain develops antibody against that strain. As newer
virus strains appear, the antibodies against the older strains might not recognize the "newer" virus, and infection
with a new strain can occur. This is one of the main reasons why people can become infected with influenza
viruses more than one time and why global surveillance is critical in order to monitor the evolution of human
influenza virus stains for selection of which strains should be included in the annual production of influenza
vaccine. In most years, one or two of the three virus strains in the influenza vaccine are updated to keep up with
the changes in the circulating influenza viruses. For this reason, people who want to be immunized against
influenza need to be vaccinated every year.

Antigenic shift refers to an abrupt, major change to produce a novel influenza A virus subtype in humans that
was not currently circulating among people (see more information below under Influenza Type A and Its
Subtypes). Antigenic shift can occur either through direct animal (poultry)-to-human transmission or through
mixing of human influenza A and animal influenza A virus genes to create a new human influenza A subtype
virus through a process called genetic reassortment. Antigenic shift results in a new human influenza A subtype.
A global influenza pandemic (worldwide spread) may occur if three conditions are met:

• A new subtype of influenza A virus is introduced into the human population.


• The virus causes serious illness in humans.
• The virus can spread easily from person to person in a sustained manner

Diagrammatic representation of Antigenic drift:

8
Diagrammatic representation of Antigenic shift

9
Low Pathogenic versus Highly Pathogenic Avian Influenza A Viruses

Avian influenza A virus strains are further classified as low pathogenic (LPAI) or highly pathogenic (HPAI) on
the basis of specific molecular genetic and pathogenesis criteria that require specific testing. Most avian
influenza A viruses are LPAI viruses that are usually associated with mild disease in poultry. In contrast, HPAI
viruses can cause severe illness and high mortality in poultry. More recently, some HPAI viruses (e.g., H5N1)
have been found to cause no illness in some poultry, such as ducks. LPAI viruses have the potential to evolve
into HPAI viruses and this has been documented in some poultry outbreaks. Avian influenza A viruses of the
subtypes H5 and H7,including H5N1, H7N7, and H7N3 viruses, have been associated with HPAI, and human
infection with these viruses have ranged from mild (H7N3, H7N7) to severe and fatal disease (H7N7, H5N1).
Human illness due to infection with LPAI viruses has been documented, including very mild symptoms (e.g.,
conjunctivitis) to influenza-like illness. Examples of LPAI viruses that have infected humans include H7N7,
H9N2, and H7N2.

In general, direct human infection with avian influenza viruses occurs very infrequently, and has been associated
with direct contact (e.g., touching) infected sick or dead infected birds (domestic poultry).

Mutation
Influenza viruses have a relatively high mutation rate that is characteristic of RNA viruses. The H5N1 virus has
mutated into a variety of types with differing pathogenic profiles; some pathogenic to one species but not
others, some pathogenic to multiple species. The ability of various influenza strains to show species-
selectivity is largely due to variation in the hemagglutinin genes. Genetic mutations in the hemagglutinin
10
gene that cause single amino acid substitutions can significantly alter the ability of viral hemagglutinin
proteins to bind to receptors on the surface of host cells. Such mutations in avian H5N1 viruses can change
virus strains from being inefficient at infecting human cells to being as efficient in causing human
infections as more common human influenza virus types. This doesn't mean one amino acid substitution
can cause a pandemic but it does mean one amino acid substitution can cause an avian flu virus that is not
pathogenic in humans to become pathogenic in humans.H3N2 ("swine flu") is endemic in pigs in China,
and has been detected in pigs in Vietnam, increasing fears of the emergence of new variant strains. The
dominant strain of annual flu virus in January 2006 was H3N2, which is now resistant to the standard
antiviral drugs amantadine and rimantadine. The possibility of H5N1 and H3N2 exchanging genes through
reassortment is a major concern. If a reassortment in H5N1 occurs, it might remain an H5N1 subtype, or it
could shift subtypes, as H2N2 did when it evolved into the Hong Kong Flu strain of H3N2.Both the H2N2
and H3N2 pandemic strains contained avian flu virus RNA segments. "While the pandemic human
influenza viruses of 1957 (H2N2) and 1968 (H3N2) clearly arose through reassortment between human
and avian viruses, the influenza virus causing the 'Spanish flu' in 1918 appears to be entirely derived from
an avian source".

Transmission of Influenza A Viruses between Animals and People

Influenza A viruses have infected many different animals, including ducks, chickens, pigs, whales, horses, and
seals. However, certain subtypes of influenza A virus are specific to certain species, except for birds, which are
hosts to all known subtypes of influenza A. Subtypes that have caused widespread illness in people either in the
past or currently are H3N2, H2N2, H1N1, and H1N2. H1N1 and H3N2 subtypes also have caused outbreaks in
pigs, and H7N7 and H3N8 viruses have caused outbreaks in horses.

Influenza A viruses normally seen in one species sometimes can cross over and cause illness in another species.
For example, until 1998, only H1N1 viruses circulated widely in the U.S. pig population. However, in 1998,
H3N2 viruses from humans were introduced into the pig population and caused widespread disease among pigs.
Most recently, H3N8 viruses from horses have crossed over and caused outbreaks in dogs.

Avian influenza A viruses may be transmitted from animals to humans in two main ways:

• Directly from birds or from avian virus-contaminated environments to people.


• Through an intermediate host, such as a pig.

Influenza A viruses have eight separate gene segments. The segmented genome allows influenza A viruses from
different species to mix and create a new influenza A virus if viruses from two different species infect the same
person or animal. For example, if a pig were infected with a human influenza A virus and an avian influenza A
virus at the same time, the new replicating viruses could mix existing genetic information (reassortment) and
produce a new virus that had most of the genes from the human virus, but a hemagglutinin and/or neuraminidase
from the avian virus. The resulting new virus might then be able to infect humans and spread from person to

11
person, but it would have surface proteins (hemagglutinin and/or neuraminidase) not previously seen in influenza
viruses that infect humans.

This type of major change in the influenza A viruses is known as antigenic shift. Antigenic shift results when a
new influenza A subtype to which most people have little or no immune protection infects humans. If this new
virus causes illness in people and can be transmitted easily from person to person, an influenza pandemic can
occur.

It is possible that the process of genetic reassortment could occur in a human who is co-infected with avian
influenza A virus and a human strain of influenza A virus. The genetic information in these viruses could reassort
to create a new virus with a hemagglutinin from the avian virus and other genes from the human virus.
Theoretically, influenza A viruses with a hemagglutinin against which humans have little or no immunity that
have reassorted with a human influenza virus are more likely to result in sustained human-to-human transmission
and pandemic influenza. Therefore, careful evaluation of influenza viruses recovered from humans who are
infected with avian influenza is very important to identify reassortment if it occurs.

Although it is unusual for people to get influenza virus infections directly from animals, sporadic human
infections and outbreaks caused by certain avian influenza A viruses and pig influenza viruses have been reported.
(For more information see Avian Influenza Infections in Humans ) These sporadic human infections and
outbreaks, however, rarely result in sustained transmission among humans.

Symptoms in humans

Avian influenza hemagglutinin bind alpha 2-3 sialic acid receptors while human influenza hemagglutinin bind
alpha 2-6 sialic acid receptors. Usually other differences also exist. There is as yet no human form of
H5N1, so all humans who have caught it so far have caught avian H5N1.

Humans who catch a humanized Influenza A virus (in other words a human flu virus of type A) usually have
symptoms that include fever, cough, sore throat, muscle aches, conjunctivitis and, in severe cases, severe
breathing problems and pneumonia that may be fatal. The severity of the infection will depend to a large part on
the state of the infected person's immune system and if the victim has been exposed to the strain before, and is
therefore partially immune. No one knows if these or other symptoms will be the symptoms of a humanized H5N1
flu.
Highly pathogenic H5N1 avian flu in a human is far worse, killing 50% of humans that catch it. In one case, a boy
with H5N1 experienced diarrhea followed rapidly by a coma without developing respiratory or flu-like symptoms.
There have been studies of the levels of cytokines in humans infected by the H5N1 flu virus. Of particular concern
is an elevated levels of tumor necrosis factor alpha (TNFα), a protein that is associated with tissue destruction at
sites of infection and increased production of other cytokines. Flu virus-induced increases in the level of cytokines
are also associated with flu symptoms including fever, chills, vomiting and headache. Tissue damage associated
with pathogenic flu virus infection can ultimately result in death. The inflammatory cascade triggered by H5N1
has been called a 'cytokine storm' by some, because of what seems to be a positive feedback process of damage to

12
the body resulting from immune system stimulation. H5N1 type flu virus induces higher levels of cytokines than
the more common flu virus types such as H1N1.

PREVENTION
Vaccines
A new vaccine is formulated annually with the types and strains of influenza predicted to be the major problems
for that year (predictions are based on worldwide monitoring of influenza). The vaccine is multivalent and the
current one is to two strains of influenza A and one of influenza B. The vaccine given to adults at present is an
inactivated preparation of egg-grown virus. It is contraindicated for those with allergies to eggs. It has a short
lived protective effect and so is usually given in the fall (figure 11) so that protection is high in December/January
- the usual peak months for flu in the northern hemisphere. It needs to be given every year since, besides the short
lived nature of the protection, the most effective strains for the vaccine will change due to drift or shift. Only
certain formulations of the vaccine are approved for young children. Previously, a subunit vaccine was
recommended.

In 2003, a live, attenuated (much less pathogenic than wild-type virus) vaccine (marketed as FluMist) was
approved for use in the United States. It is only approved for healthy individuals (those not at risk for
complications from influenza infection) from five to forty nine years of age. It is given nasally and should provide
mucosal, humoral and cell-mediated immunity. In this vaccine, the vaccine virus is a cold-adapted strain which
can grow in the upper respiratory tract where it is cooler, but grows poorly in the lower respiratory tract. It is
attenuated due to multiple changes in the various genome segments. Reassortment is used to generate viruses
which have six gene segments from the attenuated virus and the HA and NA coding segments from the virus
which is likely to be a problem in the up-coming influenza season. A reassortant is generated for each strain
expected to be a problem. Since this is a live vaccine, given intranasally as a spray, it generates an IgA response
and an IgM/G response. FluMist vaccine virus is also grown on eggs and so is contraindicated for people with an
egg allergy. Since this is a live viral vaccine, it is also contraindicated for children and young adolescents on any
therapy containing aspirin due to the potential risk of Reye's syndrome.

The CDC recommends: “Physicians should administer influenza vaccine to any person who wishes to reduce the
likelihood of becoming ill with influenza (the vaccine can be administered to children as young as 6 months).
Persons who provide essential community services should be considered for vaccination to minimize disruption of
essential activities during influenza outbreaks. Students or other persons in institutional settings (e.g., those who
reside in dormitories) should be encouraged to receive vaccine to minimize the disruption of routine activities
during epidemics.”

Chemotherapy

Rimantadine and amantadine block virus entry across the endosome and also interfere with virus release (see anti-
viral chemotherapy section). They are good prophylactic agents for influenza A, but there are some problems in
taking them on a long term basis. They may be given as protective agents during an outbreak, especially to those
at severe risk and key personnel. They may also be given at the time of vaccination for a few weeks, until the
13
humoral response has time to develop. (There is some evidence that these drugs can help prevent more serious
complications if given early in infection.)

Two neuraminidase inhibitors have recently been approved by the FDA (zanamivir [Relenza] and oseltamivir).
They are active against influenza A and influenza B. These drugs can reduce the duration of uncomplicated
influenza (by approximately 1day). Oseltamavir is approved for prophylaxis as well as treatment. At the moment,
Zanamivir is only approved for treatment but trials indicate it is probably as effective as oseltamivir in
prophylaxis.

As yet there are no clear data on the ability of any of the these drugs to reduce serious complications when used to
treat influenza (as contrasted with when they are used prophylactically).

Treatment and prevention for humans

The best treatments are rest, liquids, anti-febrile agents (not aspirin in the young or adolescent, since Reye's
disease is a potential problem). Be aware of and treat complications appropriatelyThere is no highly
effective treatment for H5N1 flu, but oseltamivir (commercially marketed by Roche as Tamiflu), can
sometimes inhibit the influenza virus from spreading inside the user's body. This drug has become a
focus for some governments and organizations trying to be seen as making preparations for a possible
H5N1 pandemic. On April 20, 2006, Roche AG announced that a stockpile of three million treatment
courses of Tamiflu is waiting at the disposal of the World Health Organization to be used in case of a flu
pandemic; separately Roche donated two million courses to the WHO for use in developing nations that
may be affected by such a pandemic but lack the ability to purchase large quantities of the drug.

There are several H5N1 vaccines for several of the avian H5N1 varieties, but the continual mutation of H5N1
renders them of limited use to date: while vaccines can sometimes provide cross-protection against related flu
strains, the best protection would be from a vaccine specifically produced for any future pandemic flu virus strain.
Dr. Daniel Lucey, co-director of the Biohazardous Threats and Emerging Diseases graduate program at
Georgetown University has made this point, "There is no H5N1 pandemic so there can be no pandemic
vaccine".However, "pre-pandemic vaccines" have been created; are being refined and tested; and do have some
promise both in furthering research and preparedness for the next pandemic.Vaccine manufacturing companies are
being encouraged to increase capacity so that if a pandemic vaccine is needed, facilities will be available for rapid
production of large amounts of a vaccine specific to a new pandemic strain.
Animal and lab studies suggest that Relenza (Zanamivir), which is in the same class of drugs as Tamiflu, may also
be effective against H5N1, in a study performed on mice in 2000, "zanamivir was shown to be efficacious in
treating avian influenza viruses H9N2, H6N1, and H5N1 transmissible to mammals" (Leneva 2001).However
another paper, de Jong 2005, suggested that Zazamivir might not provide protection in humans from the current
avian strain of H5N1 if "systemic involvement of influenza infection is suspected - as has recently been suggested
by some reports on avian H5N1 influenza in humans." While no one knows if zanamivir will be useful or not on a
14
yet to exist pandemic strain of H5N1, it might be useful to stockpile zanamivir as well as oseltamivir in the event
of an H5N1 influenza pandemic. Neither oseltamivir nor zanamivir can currently be manufactured in quantities
that would be meaningful once efficient human transmission starts.

Phylogenetic analysis
Phylogenetic analysis tools are applied to reconstruct the evolution trees at molecular level

Phylogenetic Trees: Presenting Evolutionary Relationships

Systematics describes the pattern of relationships among taxa and is intended to help us understand the history of all life. But
history is not something we can see—it has happened once and leaves only clues as to the actual events. Scientists use these clues
to build hypotheses, or models, of life's history. In phylogenetic studies, the most convenient way of visually presenting
evolutionary relationships among a group of organisms is through illustrations called phylogenetic trees.

• Node: represents a taxonomic unit. This can be


either an existing species or an ancestor.
• Branch: defines the relationship between the taxa
in terms of descent and ancestry.
• Topology: the branching patterns of the tree.
• Branch length: represents the number of changes
that have occurred in the branch.
• Root: the common ancestor of all taxa.
• Distance scale: scale that represents the number
of differences between organisms or sequences.
• Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all of their
descendents.

• Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as
individuals, populations, species, genera, or bacterial strains.

A phylogenetic tree is composed of nodes, each representing a taxonomic unit (species, populations, individuals), and branches,

15
which define the relationship between the taxonomic units in terms of descent and ancestry. Only one branch can connect any two
adjacent nodes. The branching pattern of the tree is called the topology, and the branch length usually represents the number of
changes that have occurred in the branch. This is called a scaled branch. Scaled trees are often calibrated to represent the passage
of time. Such trees have a theoretical basis in the particular gene or genes under analysis. Branches can also be unscaled, which
means that the branch length is not proportional to the number of changes that has occurred, although the actual number may be
indicated numerically somewhere on the branch. Phylogenetic trees may also be either rooted or unrooted. In rooted trees, there
is a particular node, called the root, representing a common ancestor, from which a unique path leads to any other node. An
unrooted tree only specifies the relationship among species,
without identifying a common ancestor, or evolutionary path.

Figure1.Possible ways of drawing a tree.


Phylogenetic trees, a convenient way of representing evolutionary relationships among a group of organisms, can be drawn in
various ways. Branches on phylogenetic trees may be scaled (top panel) representing the amount of evolutionary change, time, or
both, when there is a molecular clock, or they may be unscaled (middle panel) and have no direct correspondence with either time
or amount of evolutionary change. Phylogenetic trees may be rooted (top and middle panels) or unrooted (bottom panels). In the
case of unrooted trees, branching relationships between taxa are specified by the way they are connected to each other, but the
position of the common ancestor is not. For example, on an unrooted tree with five species, there are five branches (four external,
one internal) on which the tree can be rooted. Rooting on each of the five branches has different implications for evolutionary
relationships..

Methods of Phylogenetic Analysis

16
Two major groups of analyses exist to examine phylogenetic relationships: phenetic methods and cladistic methods. It is
important to note that phenetics and cladistics have had an uneasy relationship over the last 40 years or so. Most of today's
evolutionary biologists favor cladistics, although a strictly cladistic approach may result in counterintuitive results.

17
Phenetic Method of Analysis

Phenetics, also known as numerical taxonomy, involves the use of various measures of overall similarity for the ranking of
species. There is no restriction on the number or type of characters (data) that can be used, although all data must be first
converted to a numerical value, without any character "weighting". Each organism is then compared with every other for all
characters measured, and the number of similarities (or differences) is calculated. The organisms are then clustered in such a way
that the most similar are grouped close together and the more different ones are linked more distantly. The taxonomic clusters,
called phenograms, that result from such an analysis do not necessarily reflect genetic similarity or evolutionary relatedness. The
lack of evolutionary significance in phenetics has meant that this system has had little impact on animal classification, and as a
consequence, interest in and use of phenetics has been declining in recent years.

Cladistic Method of Analysis

An alternative approach to diagramming relationships between taxa is called cladistics. The basic assumption behind cladistics is
that members of a group share a common evolutionary history. Thus, they are more closely related to one another than they are to
other groups of organisms. Related groups of organisms are recognized because they share a set of unique features (apomorphies)
that were not present in distant ancestors but which are shared by most or all of the organisms within the group. These shared
derived characteristics are called synapomorphies. Therefore, in contrast to phenetics, cladistics groupings do not depend on
whether organisms share physical traits but depend on their evolutionary relationships. Indeed, in cladistic analyses two organisms
may share numerous characteristics but still be considered members of different groups.

Cladistic analysis entails a number of assumptions. For example, species are assumed to arise primarily by bifurcation, or
separation, of the ancestral lineage; species are often considered to become extinct upon hybridization (crossbreeding); and
hybridization is assumed to be rare or absent. In addition, cladistic groupings must possess the following characteristics: all
species in a grouping must share a common ancestor and all species derived from a common ancestor must be included in the
taxon. The application of these requirements results in the following terms being used to describe the different ways in which
groupings can be made:

• A monophyletic grouping is one in which all species share a common ancestor, and all species derived from that
common ancestor are included. This is the only form of grouping accepted as valid by cladists.
• A paraphyletic grouping is one in which all species share a common ancestor, but not all species derived from that
common ancestor are included.

• A polyphyletic grouping is one in which species that do not share an immediate common ancestor are lumped together,
while excluding other members that would link them.
The Origins of Molecular Phylogenetics

Macromolecular data, meaning gene (DNA) and protein sequences, are accumulating at an increasing rate because of recent
advances in molecular biology. For the evolutionary biologist, the rapid accumulation of sequence data from whole genomes has
been a major advance, because the very nature of DNA allows it to be used as a "document" of evolutionary history. Comparisons
of the DNA sequences of various genes between different organisms can tell a scientist a lot about the relationships of organisms

18
that cannot otherwise be inferred from morphology, or an organism's outer form and inner structure. Because genomes evolve by
the gradual accumulation of mutations, the amount of nucleotide sequence difference between a pair of genomes from different
organisms should indicate how recently those two genomes shared a common ancestor. Two genomes that diverged in the recent
past should have fewer differences than two genomes whose common ancestor is more ancient. Therefore, by comparing different
genomes with each other, it should be possible to derive evolutionary relationships between them, the major objective of
molecular phylogenetics.

Molecular phylogenetics attempts to determine the rates and patterns of change occurring in DNA and proteins and to reconstruct
the evolutionary history of genes and organisms. Two general approaches may be taken to obtain this information. In the first
approach, scientists use DNA to study the evolution of an organism. In the second approach, different organisms are used to study
the evolution of DNA. Whatever the approach, the general goal is to infer process from pattern: the processes of organismal
evolution deduced from patterns of DNA variation and processes of molecular evolution inferred from the patterns of variations in
the DNA itself.

19
Molecular Phylogenetic Analysis: Fundamental Elements

As we just discussed, macromolecules, especially gene and protein sequences, have surpassed morphological and other organismal
characters as the most popular forms of data for phylogenetic analyses. Therefore, this next section will concentrate only on
molecular data.

It is important to point out that a single, all-purpose recipe does not exist for phylogenetic analysis of molecular data. Although
numerous algorithms, procedures, and computer programs have been developed, their reliability and practicality are, in all cases,
dependent upon the size and structure of the dataset under analysis. The merits and shortfalls of these various methods are subject
to much scientific debate, because the danger of generating incorrect results is greater in computational molecular phylogenetics
than in many other fields of science. Occasionally, the limiting factor in such analyses is not so much the computational method
used, but the users' understanding of what the method is actually doing with the data. Therefore, the goal of this section is to
demonstrate to the reader that practical analysis should be thought of both as a search for a correct model (analysis) as well as a
search for the correct tree (outcome).

Phylogenetic tree-building models presume particular evolutionary models. For any given set of data, these models may be
violated because of various occurrences, such as the transfer of genetic material between organisms. Therefore, when interpreting
a given analysis, a person should always consider the model used and entertain possible explanations for the results obtained. For
example, models used in molecular phylogenetic analysis methods make "default" assumptions, including:

• The sequence is correct and originates from the specified source.


• The sequences are homologous—all descended in some way from a shared ancestral sequence.
• Each position in a sequence alignment is homologous with every other in that alignment.
• Each of the multiple sequences included in a common analysis has a common phylogenetic history with the other
sequences.
• The sampling of taxa is adequate to resolve the problem under study.
• Sequence variation among the samples is representative of the broader group.

• The sequence variability in the sample contains phylogenetic signal adequate to resolve the problem under study.

A straightforward phylogenetic analysis consists of four steps:

1. Alignment—building the data model and extracting a dataset.

2. Determining the substitution model—consider sequence variation.

3. Tree building.

4. Tree evaluation.

20
Introduction to Homology modelling
One method that can be applied to generate reasonable model of proteins structure is homology modelling. This procedure is also
termed as comparative modelling or knowledge-based modelling.

Why homology modelling is useful

Homology modelling are useful to get a rough idea where alpha carbon of a residue sit the folded protein. They
can guide hypothesis about structure–function relationship. Homology models are unreliable in predicting the
conformation of insertion or deletion .Homology model are unlikely to be useful in modelling ligand-docking
drug designing unless the sequence identity with the template is > 70% & even then less reliable than an
empirical crystallographic or NMR.

Aim of Comparative Modelling


The aim of comparative modelling or homology protein structure modelling is to build a 3d model for a protein of
unknown structure (the target) based on the one or more related protein of known structures.

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a ligand and its binding
site.

There are two classes of protein docking:


1)Protein-protein docking
2)Protein Receptor-Ligand

Protein-Protein Docking interactions


Protein-protein interactions occur between two proteins that are similar in size. The interface between the two
molecules tend to be flatter and smoother than those in protein-ligand interactions. Protein-protein interactions are
usually more rigid; the interfaces of these interactions do not have the ability to alter their conformation in order
to improve binding and ease movement. Conformational changes are limited by steric constraint and thus are said
to be rigid.

Fig: Protein-Protein docking.


Protein Receptor–Ligand docking

21
Protein receptor-ligand motifs fit together tightly, and are often referred to as a lock and key mechanism. There is
both high specificity and induced fit within these interfaces with specificity increasing with rigidity. Protein
receptor-ligand can either have a rigid ligand and a flexible receptor, or a flexible ligand with a rigid receptor.

Fig:Protein Ligand-Receptor Docking


Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface area between the
molecules. They move within respect to one another in a perpendicular direction in respect to the interface. This
allows for binding of a receptor with a larger than usual ligand. Normally when there is ligand overlap in the
docking interface, energy penalties incur. If the van der Waals forces can be decreased, energy loss in the system
will be minimilized. This can be accomplished by allowing flexibility in the receptor. Flexibility receptors allow
for docking of a larger ligand than would be allowed for with a rigid receptor.

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced, the receptor can retain its rigidity while
maintaing the free energy of the system. For successful docking, the parameters of the ligand need to be
maintained and the ligand must be slightly smaller in size than that of the receptor interface. No docking is
completely rigid though; there is intrinsic movement which allows for small conformational adaptation for ligand
binding. When the six degrees of freedom for protein movement are taken into consideration (three rotational,
three translational), the amount of inherent flexibility allowed the receptor is even greater. This further offsets any
energy penalty between the receptor and ligand, allowing for easier, more enegetically favorable binding between
the two.

Aim of docking

The aim of docking is to find out the new drugs target, it will open new vistas for further drug development .The
finding of our docking will be useful in finding a cure for the infectious disease bird flu, also it will open new
avenues for finding other possible drug targets in influenza A virus. The docking results can be used to design
new lead compounds and hence can aid in the new drug discovery process.
Receptor

22
A residue on the surface of the cell that serves as a recognition or binding site for antigens,antibody or other
cellular or immunological components.It is a molecule with in a cell suface to which a substance (such as
harmones or a drug ),selectively bind causing a change in the activity of the cell.
Ligand
The molecule which binds to a protein molecule (eg, receptor). As a ligand binds through the interaction of many
weak, noncovalent bonds formed to the binding site of a protein, the tight binding of a ligand depends upon a
precise fit to the surface-exposed amino acid residues on the protein.
Active Site
The active site of a protein/enzyme is the region that binds the substrates (and the cofactor, if any). It also contains
the residues that directly participate in the making and breaking of bonds. These residues are called the catalytic
groups. In essence, the interaction of the enzyme and substrate at the active site promotes the formation of the
transition state. The active site is the region of the enzyme that most directly lowers the G of the reaction,
which results in the rate enhancement characteristic of enzyme action.
Amino acids in protein active sites:

It is difficult to generalize which amino acids are likely to be in a protein active/functional site as this greatly
depends on the type of function. With that in mind, below are preferences for the 20 amino acids to lie within
functional regions on proteins These were worked out by considering how often particular amino acids were in
contact with bound non-protein atoms in protein three-dimensional structures. Postive values mean that the amino
acid makes more contacts than one would expect by chance; negative values mean that it makes fewer. The below
does not include protein-protein, or protein-peptide interactions, where many of the amino acids with negative
values (e.g. tryptophan or proline) can play critical roles.

His 0.360 Tyr -0.040 Asp 0.045 Gly -0.070


Trp -0.140 Met 0.025 Val -0.060 Asn 0.080
Leu -0.180 Phe -0.120 Gln 0.050 Cys 0.210
Ile -0.005 Ala 0.025 Glu 0.050 Arg 0.055
Pro -0.200 Lys 0.100 Thr 0.100 Ser 0.130

Neuraminidase

Neuraminidase ribbon diagram

Neuraminidase is an antigenic glycoprotein enzyme(EC 3.2.1.18) found on the surface of the Influenza virus.

23
Subtypes

Nine neuraminidase subtypes are known; many occur only in various species of duck and chicken. Subtypes N1
and N2 have been positively linked to epidemics in man, and strains with N3 or N7 subtypes have been identified
in a number of isolated deaths.

Structure

The neuraminidase enzyme exists as a mushroom-shape projection on the surface of the influenza virus. It has a
head consisting of four co-planar and roughly spherical subunits, and a hydrophobic region that is embedded
within the interior of the virus' membrane. It is comprised of a single polypeptide chain that is oriented in the
opposite direction to the hemagglutinin antigen. The composition of the polypeptide is a single chain of six
conserved polar amino acids, followed by hydrophilic, variable amino acids.

Function

Neuraminidase has functions that aid in the efficiency of virus release from cells. Neuraminidase cleaves terminal
sialic acid residues from carbohydrate moieties on the surfaces of infected cells. This promotes the release of
progeny viruses from infected cells. Neuraminidase also cleaves sialic acid residues from viral proteins,
preventing aggregation of viruses. Administration of chemical inhibitors of neuraminidase is a treatment that
limits the severity and spread of viral infections.

Neuraminidase is also a virulence factor for the bacteria Bacteroides fragilis.

Ideally influenza virus neuraminidase NA should act on the same type of virus receptor the virus hemagglutinin
HA binds to. This is not always so. It is not quite clear how the virus manages to function if there is no close
match between the specificities of NA and HA

Neuraminidase inhibitors

Inhibitors are used for combating the virus. They are zanamivir and oseltamivir.

Neuraminidase inhibitors are a class of antiviral drugs whose mode of action relies on blocking the function of
viral neuraminidase protein, thus preventing the virus from budding from the host cell.

Oseltamivir, Zanamivir and Peramivir belong to this class.

Unlike the M2 inhibitors, which work only against the influenza A, neuraminidase inhibitors act against both
influenza A and B.

24
Chapter 2

MATERIALS AND
METHODS

25
Materials and methods

Influenza virus belong to orthomyxoviridae family is a special kind of virus whose sequence
available in segments (total 8 segments) not in genome. The influenza A virus genome is contained on 8 single
non-paired RNA strained that code for 10 proteins. The segmented nature of genome allows for the exchange of
entire genes between different viral strains when they cohabitate the same cell.

For our analysis we take three different types of sequences. For this purpose we take gene sequences of five
different strains (i.e. H5N1, H2N2, H1N1, H9N2, and H3N2) available in different segment collected from NCBI
(www. ncbi.nlm.nih.gov) .we get 41 such gene sequences. We also take genome sequences of these different
strains (i.e. H5N1, H2N2, H9N2, H1N1, and H3N2) and protein sequences with the Gene and genome sequences.
We collect genes and protein sequences from influenza virus resources available on the website
(www.ncbi.nlm.nih.gov). We got around 40 such nucleotide sequences and around 60 such protein sequences.
We take these three types of sequences as each sequence is informative in their sense.

After collecting these sequences from their repositories we proceed our further analysis.

Phylogenetic analysis:

There are four steps for phylogenetic analysis:

• Sequence alignment
• Determining the substitution model
• Tree building
• Tree evaluation

Multiple Sequence Alignment

The first step following data retrieval is the execution of a multiple sequence alignment, obtained via
CLUSTALW (progressive alignment method). The purpose of this step is to place the most closely related
sequences in the user's data set together prior to initiating tree construction. PHYLIP takes the patterns gleaned
from multiple sequence alignment when building phylogenies.

2. Phylogenetic Method

26
Analyses in the present interface are rendered according to the distance method. Four program within Phylip are
employed here they are SEQBOOT, DNADIST, NEIGHBOR AND CONSENSE.

[A] Once multiple alignment has been completed, the data set is transmitted to SEQBOOT. SEQBOOT generates
multiple possible arrangements of the alignment (reflecting the number of conceivable evolutionary paths).

[B]DNADIST reads in the data from SEQBOOT and computes a distance score for protein sequences. This step is
most critical, since no subsequent analysis can be made without a measure of sequence divergence or similarity. A
Day Hoff PAM matrix is used for computation of distance scores between pairs of sequences. A distance score
reflects the number of single amino acid alterations required in order generate an identity sequence from a second
sequence.

[C] NEIGHBOR implements the Neighbor-joining method (Saitou and Nei 1987) to determine the most
reasonable positioning of branches. Two sequences having the smallest distance scores are joined as "neighbors"
and will share a node below them (or to their left) in the final tree.

Alternatively, if the user specifies a rooted tree, then NEIGHBOR implements another algorithm, the
unweighted pair group method with arithmetic mean (UPGMA). The UPGMA algorithm assumes a
molecular clock and generates rooted trees.

[D] Then the branch ordering data is passed to CONSENSE for resampling computations. Any phylogenetic
method renders the most likely tree, i.e., those relationships that are most reasonable given the sequence
alignments. As such, any single tree is only one of many possible trees that could have arisen over evolutionary
time. Resampling methods, therefore, are designed to find the most probable tree among the many possible
evolutionary paths that could have generated a given set of proteins.

[E] Lastly we draw tree using NJ-PLOT. NJ-plot is a tree drawing program able to draw any phylogenetic tree
expressed in the Newick phylogenetic tree format (e.g., the format used by the PHYLIP package). NJ plot is
especially convenient for rooting the unrooted trees obtained from parsimony, distance or maximum likelihood
tree-building methods. The trees were drawn as unrooted trees.

Family analysis
Then we go for family analysis, in family analysis we use GENSCAN tool for finding motifs,
exons, introns in our genome sequences. To find out the ORF, we use GET ORF of EMBOSS.

Modelling
Taking protein sequences from the NCBI of h5n1 strain of influenza A virus we perform homology modelling of
these protein sequences using SWISS MODEL server.
First step that we follow is we do PDB Blast of these sequences to get appropriate template present in PDB for our
sequences. We get lots of hits .Among them we select the best template following some criteria.
Then we go for modelling through Swiss Model Server.
27
Then we visualize the modelled structure modelled by Swiss Model Server in SWISSPDB VIEWER. After that
our next step is docking of neuraminidase protein.

Docking
At last we take protein neuraminidase of avian influenza virus, this protein is one of the reasons of its
pathogenicity.To perform docking we use HEX software. It automatically searches the active site for our ligand
where our ligand is best fitted.
For performing the docking we find out the ligand and receptor of our protein using many receptor and ligand
finding tools such as PDBSUM, SUMO, CSA, and JENA LIBRARY.

28
Chapter 3

RESULT AND DISCUSSION

29
Result and discussion

In the current work of Phylogenetic analysis the trees were constructed using neighbor joining method and were
represented as unrooted trees .The bootstrap values at the node representing the robustness of the trees were also
satisfactory. We find out branch length and distances of gene.

The general nature of the tree and the relative distances of different strains from common ancestors are analyzed.
Bootstrap is used to evaluate the reliability of a Phylogenetic Tree. In a bootstrapped tree, u can see some values
in each node. According to these values, we can say the evolutionary strength of the nodes. The scale bar shows
the number of substitution per residue.

Table 1:

[A]

Gene showing higher branch length in H5N1

GENE NAME BRANCH LENGTH PRESENT IN STRAINS

PA 2.23 H5N1
NS1,NS2 1.56 H5N1

[B]
Gene showing higher branch length in H9N2

GENE NAME BRANCH LENGTH PRESENT IN STRAINS

NP 1.767 H9N2
PB2 1.665 H9N2

30
[C]
Gene showing higher branch length in H3N2

GENE NAME BRANCH LENGTH PRESENT IN STRAINS

HA 2.03 H3N2
M1,M2 2.45 H3N2

[D]

Gene showing higher branch length in H1N1

GENE NAME BRANCH LENGTH PRESENT IN STRAINS

NA 2.206 H1N1
PB1,PB1-F2 1.858 H1N1

Our analysis through gene sequences shows that same genes like PA, HA, PB1, PB1-F2, NS, PB2, M1, M2, NP.
NA are present in all strains. It reflects that H5N1, H2N2, H9N2, H3N2, H1N1 are evolved from the same
common ancestor at the same rate.

In case of genes like PA, NS1,NS2 they remain more conserved in h9n2,h2n2,h3n2,h1n1 than in h5n1[table no.1
(A)]

In case of genes like NP,PB2 they remain more conserved in h5n1,h2n2,h3n2,and h1n1 than in h9n2.[table no.
(B)]

In contrast, for the genes like HA, M1, M2, h3n2 strain appears to diverge more from the common ancestor than
h1n1, h2n2, h5n1, h9n2 [table no.1 (C)].

In case of genes like PB1 and PB1-F2 are highly conserved in h3n2, h2n2, h9n2, h5n1 than in h1n1.{table
no.1(D)]

31
Therefore, from this observation it might be concluded that in the course of evolution, the genes underwent
suitable modifications in strains h1n1, h9n2, h5n1, h3n2. as compared to h2n2.This proves that H2N2 is less
pandemic as compared to others, which are main causal of pandemic bird flu now-a-days

So our current analysis, it can be said that overall, from the common ancestor these strains are diverged more in
the course of evolution. In order to adopt a better survival strategy this drift is more prominent.

Outputs

Clustal w output

ClustalW Results

Results of search

Number of sequences 41

Alignment score 1714686

Sequence format Pearson

Sequence type nt

ClustalW version 1.83

Output file clustalw-20060728-05490446.output

Alignment file clustalw-20060728-05490446.aln

Guide tree file clustalw-20060728-05490446.dnd

Your input file clustalw-20060728-05490446.input

View Scores Table View Guide Tree SUBMIT ANOTHER JOB

Alignment

Show Colors View Alignment File

41 2392

gi|7385295 ---------- ---------- ---------- -----AGCAA AAGCAGGTAC


gi|3214017 ---------- ---------- ---------- ---------A AAGCAGGTAC
gi|7391268 ---------- ---------- ---------- -----AGCAA AAGCAGGTAC
gi|8486136 ---------- ---------- ---------- -----AGCGA AAGCAGGTAC
gi|7391913 ---------- ---------- ---------- -----AGCAA AAGCAGGTAC
gi|3214015 ---------- ---------- ---------- ---------- ----------
gi|9316315 ---------- ---------- ---------- ---------- ----------
gi|7385295 ---------- ---------- ---------- ---------- ----------
gi|7392130 ---------- ---------- ---------- ---------- ----------
gi|7391914 ---------- ---------- ---------- ---------- ----------
gi|8486129 ---------- ---------- ---------- ---------- ----------
gi|7385294 AGCAAAAGCA GGTCAATTAT ATTCAATATG GAAAGAATAA AAGAACTAAG
gi|3214016 GCCAAAAGCA GGTCAATTAT ATTCAATATG GAAAGAATAA AAGAACTAAG
gi|7391882 AGCAAAAGCA GGTCAATTAT ATTCAATATG GAAAGAATAA AAGAACTACG

32
gi|7391905 AGCAAAAGCA GGTCAATTAT ATTCAGTATG GAAAGAATAA AAGAACTACG
gi|8486138 AGCGAAAGCA GGTCAATTAT ATTCAATATG GAAAGAATAA AAGAACTAAG
gi|7392156 ---------- ---------- ---------- ---------- ----------
gi|7391921 ---------- ---------- ---------- ---------- ----------
gi|8486131 ---------- ---------- ---------- ---------- ----------
gi|3214016 ---------- ---------- ---------- ---------- ----------
gi|7385294 ---------- ---------- ---------- ---------- ----------
gi|7385295 ---------- ---------- ---------- ---------- ----------
gi|3214142 ---------- ---------- ---------- ---------- ----------
gi|7391268 ---------- ---------- ---------- ---------- ----------
gi|7391915 ---------- ---------- ---------- ---------- ----------
gi|8486122 ---------- ---------- ---------- ---------- ----------
gi|7385295 ---------- ---------- ---------- ---------- ----------
gi|7391914 ---------- ---------- ---------- ---------- ----------
gi|8486125 ---------- ---------- ---------- ---------- ----------
gi|3214016 ---------- ---------- ---------- ---------- ----------
gi|7391920 ---------- ---------- ---------- ---------- ----------
gi|7392126 ---------- ---------- ---------- ---------- ----------
gi|8486127 ---------- ---------- ---------- ---------- ----------
gi|7392130 ---------- ---------- ---------- ---------- ----------
gi|7391913 ---------- ---------- ---------- ---------- ----------
gi|3214016 ---------- ---------- ---------- ---------- ----------
gi|7385294 ---------- ---------- ---------- ----AGCAAA AGCAGGCAAA
gi|3214016 ---------- ---------- ---------- -----GCAAA AGCAGGCAAA
gi|7391268 ---------- ---------- ---------- ----AGCAAA AGCAGGCAAA
gi|7391914 ---------- ---------- ---------- ----AGCAAA AGCAGGCAAA
gi|8486134 ---------- ---------- ---------- ----AGCGAA AGCAGGCAAA

TGATCCAAAA TGGAAGACTT TGTGCGACAA TGCTTCAATC CAATGATTGT


TGATCCAAAA TGGAAGACTT TGTGCGACAG TGCTTCAATC CAATGATTGT
TGATTCGAAA TGGAAGATTT TGTGCGACAA TGCTTCAATC CGATGATTGT
TGATCCAAAA TGGAAGATTT TGTGCGACAA TGCTTCAATC CGATGATTGT
TGATTCGAAA TGGAAGATTT TGTGCGACAA TGCTTCAACC CGATGATTGT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AGATCTAATG TCGCAGTCCC GCACTCGCGA GATACTAACA AAAACCACTG
AAATTTGATG TCGCAATCTC GCACTCGCGA GATACTGACA AAAACCACTG
GAATCTGATG TCGCAGTCTC GCACTCGCGA GATACTAACA AAAACCACAG
GAACCTGATG TCGCAGTCTC GCACTCGCGA GATACTGACA AAAACCACAG
AAATCTAATG TCGCAGTCTC GCACCCGCGA GATACTCACA AAAACCACCG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CCATTTGAAT GGATGTCAAT CCGACTTTAC TTTTCTTAAA AGTGCCAGCG
CCATTTGAAT GGATGTCAAT CCGACTTTAC TTTTCTTAAA AGTGCCAGCG
CCATTTGAAT GGATGTCAAT CCGACCTTAC TTTTCTTGAA AGTTCCAGCG
CCATTTGAAT GGATGTCAAT CCGACTCTAC TGTTCCTAAA GGTTCCAGCG
CCATTTGAAT GGATGTCAAT CCGACCTTAC TTTTCTTAAA AGTGCCAGCA

CGAGCTTGCG GAAAAGGCAA TGAAAGAATA TGGGGAAGAT CCGAAAATCG


CGAGCTTGCG GAAAAGACAA TGAAGGAATA TGGGGAAGAC CCGAAAATTG
CGAACTTGCG GAAAAGGCAA TGAAAGAGTA TGGAGAAGAT CTGAAAATCG
CGAGCTTGCG GAAAAAACAA TGAAAGAGTA TGGGGAGGAC CTGAAAATCG
CGAACTTGCA GAAAAAGCAA TGAAAGAGTA TGGAGAGGAT CTGAAAATTG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TGGATCATAT GGCCATAATC AAGAAATACA CATCAGGAAG ACAAGAGAAG
TGGATCATAT GGCCATAATT AAGAAGTACA CATCAGGAAG ACAGGAGAAG
TGGACCATAT GGCCATAATT AAGAAGTACA CATCAGGGAG ACAGGAAAAG
TGGACCATAT GGCCATAATT AAGAAGTACA CATCGGGGAG ACAGGAAAAG

33
TGGACCATAT GGCCATAATC AAGAAGTACA CATCAGGAAG ACAGGAGAAG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CAAAATGCTA TAAGTACCAC ATTCCCTTAT ACTGGAGATC CTCCATACAG
CAAAATGCAA TAAGTACCAC ATTCCCTTAT ACTGGAGATC CCCCATATAG
CAAAATGCCA TAAGTACTAC ATTCCCTTAT ACTGGAGATC CTCCATACAG
CAAAATGCCA TAAGCACCAC ATTCCCTTAT ACTGGAGATC CTCCATACAG
CAAAATGCTA TAAGCACAAC TTTCCCTTAT ACCGGAGACC CTCCTTACAG

AAACGAACAA ATTTGCCGCA ATATGCACGC ACTTAGAAGT CT---GTTTC


AAACAAATAA GTTCGCTGCA ATATGCACAC ACTTAGAAGT CT---GCTTC
AAACAAACAA ATTTGCAGCA ATATGCACTC ACTTGGAAGT AT---GCTTC
AAACAAACAA ATTTGCAGCA ATATGCACTC ACTTGGAAGT AT---GCTTC
AAACAAACAA ATTTGCAGCA ATATGCACCC ACTTGGAGGT AT---GTTTC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AACCCTGCTC TCAGAATGAA ATGGATGATG GCAATGAAAT ATCCAATCAC
AATCCCGCTC TTAGAATGAA ATGGATGATG GCGATGAAAT ACCCGATCAC
AACCCGTCAC TTAGGATGAA ATGGATGATG GCAATGAAAT ATCCAATTAC
AACCCGTCAC TTAGGATGAA ATGGATGATG GCAATGAAAT ACCCAATCAC
AACCCAGCAC TTAGGATGAA ATGGATGATG GCAATGAAAT ATCCAATTAC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CCATGGAACA GGAACAGGAT ACACCATGGA CACAGTCAAC AGAACACATC
CCATGGAACA GGAACAGGAT ACACCATGGA CACAGTCAAC AGAACACATC
CCATGGGACA GGAACAGGAT ACACCATGGA CACAGTCAAC AGAACACATC
CCATGGAACA GGAACAGGAT ACACCATGGA CACAGTCAAC AGAACACACC
CCATGGGACA GGAACAGGAT ACACCATGGA TACTGTCAAC AGGACACATC

ATGTATTCAG ATTTCCACTT TATTGATGAA CGGGGCGAAT CAACAATTAT


ATGTATTCAG ACTTCCATTT CATTGACGAA CGAGGCGAAT CAATAATTGT
ATGTATTCAG ATTTTCATTT CATCAATGAG CAAGGCGAGT CAATAATGGT
ATGTATTCAG ATTTCCACTT CATCAATGAG CAAGGCGAGT CAATAATCGT
ATGTATTCAG ATTTTCATTT CATCAATGAA CAAGGCGAAT CAATAGTGGT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AGCAGACAAG AGAATAATGG AGATGATTCC TGAAAGGAAT GAGCAAGGAC
AGCTGACAAA AGAATAATGG AGATGATCCC TGAAAGGAAT GAGCAAGGCC
AGCTGACAAG AGGATAACAG AAATGGTTCC TGAGAGAAAT GAGCAAGGAC
TGCTGACAAA AGGATAACAG AAATGGTTCC GGAGAGAAAT GAACAAGGAC
AGCAGACAAG AGGATAACGG AAATGATTCC TGAGAGAAAT GAGCAAGGAC

34
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- -------GCA GGGGT----- --ATAATCTG TCAAAATGGA
---------- AGCAAAAGCA GGGGTTATA- CCATAGACAA CCAAAAGCAT
---------- AGCAAAAGCA GGGGAAA--- ATAAAAACAA CCAAAATGAA
---------- -GCAAAAGCA GGGGAAT--- TACTTAACTA GCAAAATGGA
---------- AGCAAAAGCA GGGGATAATT CTATTAACCA TGAAGACTAT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AATATTCAGA AAAGGGGAAA TGGACAACGA ACACAGAGAC TGGAGCACCC
AATATTCAGA AAAAGGGAGG TGGACAACAA ACACAGAGAC CGGAGCACCC
AATATTCAGA AAAGGGGAAG TGGACAACAA ACACGGAAAC TGGAGCGCCC
AATATTCAGA GAAGGGGAAG TGGACGACAA ATACAGAAAC TGGGGCACCC
AGTACTCAGA AAAGGCAAGA TGGACAACAA ACACCGAAAC TGGAGCACCG

AGAATCTGGC GATCCCAATG CATTATTGAA ACACCGGTTT GAAATAATCG


GGAATCTGGT GATCCAAATG CATTGTTGAA GCACAGGTTT GAAATAATTG
AGAGCTTGAT GATCCAAATG CACTTTTGAA GCACAGATTT GAAATAATAG
AGAACTTGGT GATCCTAATG CACTTTTGAA GCACAGATTT GAAATAATCG
AGAACTTGAT GATCCAAATG CACTGTTAAA GCACAGATTT GAAATAATCG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AAACGCTTTG GAGCAAGACA AATGATGCTG GGTCGGACAG AGTGATGGTG
AAACTCTTTG GAGCAAAACA AATGACGCTG GATCAGACAG GGTAATGGTA
AAACTCTATG GAGTAAAATG AGTGATGCCG GGTCAGATCG AGTAATGGTA
AAACTCTATG GAGTAAAATG AGTGATGCTG GATCAGATCG AGTGATGGTA
AAACTTTATG GAGTAAAATG AATGATGCCG GATCAGACCG AGTGATGGTA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GAAAAT---- AGTGCTTCTT CTTGCAATAG TCAGTCTT-- ---------G
AACAAT---- GGCCATCATT TATCTCATAC TCCTGTTCAC AG-----CAG
GGCAA----- ACCTACTGGT CCTGTTATGT GCACTTGC-- AG-----CTG
AACAAT---- ATCACTAATA ACTATACTAC TAGTAGTAAC AG-----CAA
CATTGCTTTG AGCTACATTC TATGTCTGGT TTTCGCTCAA AAACTTCCCG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CAACTCAATC CGATTGATGG ACCACTACCT GAGGATAATG AGCCGAGTGG
CAACTCAACC CTATTGATGG ACCATTACCT GAAGACAATG AGCCGAGCGG
CAACTTAACC CAATTGATGG ACCACTACCT GAGGACAATG AACCAAGTGG
CAACTCAACC CAATTGATGG ACCACTACCT GAGGATAATG AGCCAAGTGG
CAACTCAACC CGATTGATGG GCCACTGCCA GAAGACAATG AACCAAGTGG

AAGGGAGGGA CCGAACAATG GCCTGGACAG TGGTGAATAG TATCTGCAAC


AAGGAAGAGA CCGAGCAATG GCCTGGACAG TGGTGAATAG CATCTGCAAC
AGGGAAGAGA TCGCACAATG GCCTGGACAG TAGTAAACAG TATTTGCAAC
AGGGAAGAGA TCGCACAATG GCCTGGACAG TAGTAAACAG TATTTGCAAC
AGGGGAGAGA CAGAACAATG GCCTGGACAG TAGTAAACAG TATCTGCAAC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TCTCCCCTAG CTGTAACTTG GTGGAACAGG AATGGGCCGA CAACAAGTAC
TCACCTCTGG CTGTAACGTG GTGGAACAGA AATGGACCAA CAACAAGTAC
TCACCTTTGG CAGTGACATG GTGGAATAGA AATGGACCAA TGACAAGTAC
TCACCTTTGG CTGTAACATG GTGGAATAGA AATGGACCCG TGGCAAGTAC
TCACCTCTGG CTGTGACATG GTGGAATAGG AATGGACCAA TGACAAATAC
---------- ---------- ---------- ---------- ----------

35
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TCAAAAGTGA TCAGA----- ----TTTGCA TTGGTTACCA TGCAAACAAC
TGAGGGGGGA CCAGA----- ----TATGCA TTGGATACCA TGCCAATAAT
CAGATGCAGA CACAA----- ----TATGTA TAGGCTACCA TGCGAACAAT
GCAATGCAGA TAAAA----- ----TCTGCA TCGGCCACCA GTCAACAAAC
GAAATGACAA CAGCACGGCA ACGCTGTGCC TTGGGCACCA TGCAGT--AC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GTATGCACAA ACAGATTGTG TATTGGAAGC AATGGCTTTC CTTGAAGAAT
GTATGCACAA ACAGATTGTG TATTGGAAGC AATGGCTTTC CTTGAAGAAT
ATATGCACAA ACAGACTGCG TCCTGGAAGC AATGGCTTTC CTTGAGGAAT
ATATGCACAA ACAGACTGTG TCCTGGAGGC TATGGCCTTC CTTGAAGAAT
TTATGCCCAA ACAGATTGTG TATTGGAAGC AATGGCTTTC CTTGAGGAAT

ACCACAGGAG TTGAGAAGCC T-AAATTTCT CCCAGATTTG TATGACTACA


ACAACAGGAG TCGATAAACC C-AAATTTCT TCCGGATCTA TACGACTACA
ACCACAGGAG CTGAGAAACC G-AAGTTTCT GCCAGATTTG TATGATTACA
ACTACAGGGG CTGAGAAACC A-AAGTTTCT ACCAGATTTG TATGATTACA
ACTACTGGAG CAGAAAAACC A-AAGTTTCT ACCAGATTTG TATGATTACA
---------- ---------- ---------- ----AGCAAA AGCAGGGTAG
---------- ---------- ---------- ---------- AGCAGGGTTA
---------- ---------- ---------- ----AGCAAA AGCAGGGTAG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ----AGCAAA AGCAGGGTTA
---------- ---------- ---------- ----AGCAAA AGCAGGGTAG
AGTCCATTAT CCAAAGGTTT ACAAAACATA CTTTGAGAAG GTTGAAAGGT
AGTCCATTAT CCAAAGGTGT ATAAAACCTA CTTTGAAAAG GTTGAAAGAT
GGTTCATTAT CCAAAAATCT ACAAGACTTA TTTTGAGAAA GTCGAAAGGT
GGTCCATTAC CCAAAAGTAT ACAAGACTTA TTTTGACAAA GTCGAAAGGT
AGTTCATTAT CCAAAAATCT ACAAAACTTA TTTTGAAAGA GTCGAAAGGC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TCGACAGAG- -CAGGTTGAC ACAATAATGG AAAAGAACGT TACTGTTACA
TCCACAGAA- -AAGGTCGAC ACAATTCTAG AGCGGAATGT CACTGTGACT
TCAACCGAC- -ACTGTTGAC ACAGTGCTCG AGAAGAATGT GACAGTGACA
TCCACAGAA- -ACTGTGGAC ACGCTAACAG AAACCAATGT TCCTGTGACA
CAAACGGAAC GATAGTGAAA ACAATCACGA ATGACCAAAT TGAAGTCACT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CCCACCCAGG GATCTTTGAA AACTCGTGTC TTGAAACGAT GGAAGTTGTT
CCCACCCAGG ACTCTTTGAA AACTCATGTC TTGAAACGAT GGAAGTTGTC
CACACCCAGG AATCTTTGAA AATTCGTGTC TTGAAACGAT GGAAGTTATT
CCCACCCAGG TATCTTTGAG AACTCATGCC TTGAAACAAT GGAAGTCGTT
CCCATCCTGG TATTTTTGAA AACTCGTGTA TTGAAACGAT GGAGGTTGTT

AGGAGAACCG ATTTATTGAA ATTGGAGTGA CACGGAGGGA AGTTCACACA


AGGAAAACCG ATTCACTGAA ATTGGTGTGA CACGGAGGGA AGTTCACATA
AGGAGAATAG ATTCATCGAG ATTGGAGTGA CAAGGAGAGA AGTCCACATA
AGGAAAATAG ATTCATCGAA ATTGGAGTAA CAAGGAGAGA AGTTCACATA
AGGAGAATAG ATTCATCGAA ATTGGAGTGA CAAGAAGAGA AGTCCACATA
ATAATCACTC ACTGAGTGAC ATCAACATCA TGGCGTCCCA AGGCACCAAA
ATAATCACTC ACTGAGTGAC ATCAACATCA TGGCGTCGCA AGGCACCAAA
ATAATCACTC ACTGAGTGAC ATCAACATCA TGGCGTCTCA GGGCACCAAA
---------- ---------- ---------A TGGCGTCCCA AGGCACCAAA
ATAATCACTC ACCGAGTGAC ATCAAAATCA TGGCGTCCCA AGGCACCAAA
ATAATCACTC ACTGAGTGAC ATCAAAATCA TGGCGTCCCA AGGCACCAAA
TAAAACATGG AACCTTCGGT CCCGTTCATT TCCGAAACCA AGTTAAAATA
TAAAACACGG AACCTTTGGC CCTGTTCATT TCCGGAATCA AGTCAAAATA
TAAAACATGG AACCTTTGGC CCTGTCCATT TTAGAAACCA AGTCAAAATA
TAAAACATGG AACCTTTGGC CCTGTTCATT TTAGAAATCA AGTCAAGATA
TAAAGCATGG AACCTTTGGC CCTGTCCATT TTAGAAACCA AGTCAAAATA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

36
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CATGCCCAAG ACATACTGGA AAAGACACAC AATGGGAAGC TCTGCGATCT
CATGCCAAGG ACATCCTTGA GAAGACCCAT AACGGAAAGC TATGCAAACT
CACTCTGTTA ACCTGCTCGA AGACAGCCAC AACGGAAAAC TATGTAGATT
CATGCCAAAG AATTGCTCCA CACAGAGCAT AATGGAATGC TGTGTGCAAC
AATGCTACTG AACTGGTTCA GAGTTCCTCA ACAGGTGGAA TATGCGA--C
---------- ---------- -------AGC AAAAGCAGGA GATTAAAATG
---------- ---------- -------AGC GAAAGCAGGG GTTTAAAATG
---------- ---------- ---------- ---------- -------ATG
---------- ---------- -------AGC AAAAGCAGGA G-TAAAGATG
---------- ---------- ---------- ---------- -------ATG
CAGCAAACAA GAGTGGATAA GCTGACCCAA GGTCGCCAAA CCTATGACTG
CAGCAAACGA GAGTGGATAA GCTGACCCAA GGTCGCCAGA CTTATGACTG
CAACAAACAA GAGTGGACAA ACTGACCCAA GGTCGTCAGA CCTATGACTG
CAACAAACAA GGGTGGACAA ACTAACCCAA GGCCGCCAGA CTTATGATTG
CAGCAAACAC GAGTAGACAA GCTGACACAA GGCCGACAGA CCTATGACTG

TACTATCTAG AAAAAGCCAA -CAAGATAAA ATCTGAGAAG ACACACATTC


TATTACTTAG AAAAAGCTAA -CAAGATAAA ATCCGAGAAA ACACATATCC
TACTATCTTG AAAAGGCCAA -TAAAATTAA ATCTGAGAAT ACACACATCC
TACTATCTGG AAAAGGCCAA -TAAAATTAA ATCTGAGAAA ACACACATCC
TATTACCTTG AAAAGGCCAA -TAAAATTAA ATCTGAGAAC ACACACATTC
CGATCCTATG AACAGATGGA -AACTGGTGG AGAACGCCAG AATGCCACTG
CGATCCTATG AACAGATGGA -AACTGGTGG AGAACGCCAG AATGCCACTG
CGATCTTATG AACAGATGGA -AACTGGTGG AGAACGCCAG AATGCTACTG
CGGTCTTATG AACAGATGGA -AACTGATGG GGAACGCCAG AATGCAACTG
CGGTCTTATG AACAGATGGA -AACTGATGG GGATCGCCAG AATGCAACTG
CGGTCTTACG AACAGATGGA -GACTGATGG AGAACGCCAG AATGCCACTG
CGTCGCCGGG TGGATATAAA -CCCGGGCCA TGCAGATCTC AGTGCTAAAG
CGCCGCAGGG TTGACATGAA -CCCTGGCCA TGCAGATCTC AGCGCTAAAG
CGCCGAAGAG TTGACATAAA -CCCTGGTCA TGCAGACCTC AGTGCCAAGG
CGCAGAAGAG TAGACATAAA -CCCTGGTCA TGCAGACCTC AGTGCCAAAG
CGTCGGAGAG TTGACATAAA -TCCTGGTCA TGCAGATCTC AGTGCCAAGG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AAATGGAGTG AAGCCTCTCA -TTTTGAGAG ATTGTAGTGT AGCTGGATGG
AAACGGAATC CCTCCACTTG -AACTAGGGG ACTGTAGCAT TGCCGGATGG
AAAAGGAATA GCCCCACTAC -AATTGGGGA AATGTAACAT CGCCGGATGG
AAGCCTGGGA CATCCCCTCA -TTCTAGACA CATGCACTAT TGAAGGACTA
AGTCCTCATC AGATC-CTTG -ATGGAGAAA ACTGCACACT AATAGATGCT
AATCCAAATC AGAAGATAAT -AACCATTGG ATCAATCTGT ATGGTAGTTG
AATCCAAATC AGAAAATAAT -AACCATTGG ATCAATCTGT CTGGTAGTCG
AATCCAAATC AAAAGATAAT -AACAATTGG CTCTGTCTCT CTCACCATTG
AATCCAAATC AAAAGATAAT -AACGATTGG CTCTGTTTCT CTCACCATTT
AATCCAAATC AAAAGATAAT -AGCACTTGG CTCTGTTTCT ATAACTATTG
GACATTGAAA AGAAACCAGC CGGCTGCAAC CGCTTTGGCC AACACTATAG
GACATTGAAT AGAAACCAGC CGGCTGCAAC TGCTTTGGCC AACACCATAG
GACATTGAAC AGAAATCAGC CGGCTGCAAC TGCGCTAGCC AACACTATAG
GACATTAAAC AGAAATCAAC CGGCAGCAAC TGCATTAGCC AACACCATAG
GACTTTAAAT AGAAACCAGC CTGCTGCAAC AGCATTGGCC AACACAATAG

ACATATTCTC ATTCACTGGA GAGGAAATGG CCACCAAAGC GGACTACACC


ACATCTTTTC ATTCACTGGA GAAGAAATGG CCACTAAAGC TGACTACACC
ACATTTTCTC ATTCACTGGG GAAGAAATGG CCACAAAGGC CGACTACACT
ACATTTTCTC GTTCACTGGG GAAGAAATGG CCACAAAGGC CGACTACACT
ACATCTTCTC ATTCACTGGG GAGGAAATAG CCACAAAGGC AGACTACACT
AGATCAGGGC ATCTGTTGGA AGAATGGTT- ----GGTGG- AATTGGGAGG
AGATCAGGGC ATCTGTTGGA AGAATGGTT- ----GGTGG- AATTGGGAGG
AGATCAGAGC ATCTGTTGGA AGAATGGTT- ----GGTGG- AATTGGGAGG
AGATCAGAGC ATCCGTCGGG AAGATGATT- ----GATGG- AATTGGACGA
AGATTAGGGC ATCCGTCGGG AAGATGATT- ----GATGG- AATTGGGAGA
AAATCAGAGC ATCCGTCGGA AAAATGATT- ----GGTGG- AATTGGACGA
AAGCACAAGA TGTTATCATG GAGGTCGTTT TCCCAAATG- AAGTGGGAGC
AAGCACAAGA TGTCATCATG GAGGTCGTTT TCCCAAATG- AAGTTGGAGC
AGGCACAAGA CGTAATCATG GAAGTTGTTT TCCCCAATG- AAGTGGGGGC
AGGCACAAGA TGTAATTATG GAAGTTGTTT TTCCCAATG- AAGTGGGAGC
AGGCACAGGA TGTAATCATG GAAGTTGTTT TCCCTAACG- AAGTGGGAGC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

37
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CTCCTCGGAA ACCCTATGTG TGACGAATTC ATCAATGTGC CGGAATGGTC
CTCCTTGGAA ATCCAGAATG TGATAGGCTT CTAAGTGTGC CAGAATGGTC
CTCTTGGGAA ACCCAGAATG CGACCCACTG CTTCCAGTGA GATCATGGTC
GTCTATGGCA ACCCTTCTTG TGACCTGCTG TTGGGAGGAA GAGAATGGTC
CTATTGGGAG ACCCTCAGTG TGATGGCTTC C--AAAATAA GAAATGGGAC
GGATAATTAG CTTGATGTTA CAAATTGGGA ACATAATCTC AA-TATGGGT
GACTAATTAG CCTAATATTG CAAATAGGGA ATATAATCTC AA-TATGGAT
CAACAGTATG CTTCCTCATG CAGATTGCCA TCCTGGTAAC TACTGTGACA
CCACAATATG CTTCTTCATG CAAATTGCCA TCCTGATAAC CACTGTAACA
CGACAATATG TTTACTCATG CAGATTGCCA TCTTAGCAAC GACTATGACA
AGGTCTTCAG ATCGAATGGT CTAACAGCCA ATGAATCGGG AAGGCTAATA
AAGTATTCAG ATCGAACGGT CTAACAGCCA ATGAGTCAGG AAGGTTAATA
AGGTCTTCAG ATCGAATGGA CTGACAGCTA ATGAGTCGGG AAGGCTAATA
AAGTTTTTAG ATCGAATGGA CTAACAGCCA ATGAATCAGG AAGGCTAATA
AAGTGTTCAG ATCAAATGGC CTCACGGCCA ATGAGTCTGG AAGGCTCATA

CTTGATGAAG AAAGCAGGGC CCGAATCAAA ACCAGGCTGT TCACTATAAG


CTTGATGAAG AGAGCAGGGC AAGAATAAAA ACCAGACTAT TCACCATAAG
CTCGATGAGG AAAGCAGGGC TAGGATCAAA ACCAGACTAT TCACCATAAG
CTCGATGAAG AAAGCAGGGC TAGGATCAAA ACCAGGCTAT TCACCATAAG
CTCGACGAGG AAAGCAGGGC TAGGATTAAA ACCAGGCTAT TTACCATAAG
TTTTACGTAC AGATGTGCAC TGAACTCAAA CTCAGCGACC AAGAAGGAAG
TTTTACGTAC AGATGTGCAC TGAACTCAAA CTCAGCGACC AAGAAGGAAG
TTTTATATAC AGATGTGCAC TGAACTCAAA CTCAGCGACT ATGAAGGAAG
TTCTACATCC AAATGTGCAC CGAACTTAAA CTCAGTGATT ATGAGGGGCG
TTCTACATCC AAATGTGCAC TGAACTTAAA CTCAGTGATC ATGAAGGGCG
TTCTACATCC AAATGTGCAC AGAACTTAAA CTCAGTGATT ATGAGGGACG
TAGAATATTG ACATCAGAGT CGCAATTGAC AATAACAAAA GAGAAGAAAG
CAGGATATTG ACATCAGAAT CACAGCTGAC AATAACAAAG GAAAAGAGGG
CAGGATACTA ACGTCGGAAT CACAATTAAC AATAACCAAA GAGAAAAAAG
CAGGATACTA ACATCAGAAT CGCAATTAAC AATAACTAAA GAGAAAAAAG
CAGGATACTA ACATCGGAAT CGCAACTAAC GATAACCAAA GAGAAGAAAG
---------- ---------- ---------- ---------- ----------
---------- ---------- ----AGCAAA AGCAGGGTGA CAAAGACATA
---------- ---------- ----AGCAAA AGCAGGGTGA CAAAGACATA
---------- ---------- ----AGCAAA AGCAGGGTGA CAAAGACATA
---------- ---------- ---------- ------GTGA CAAAGACATA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TTACATAGTG GAGAAGGCCA GTCCAGCCAA TGACCTCTGT TACCCAGGGG
CTATATAATG GAGAAAGAAA ACCCGAGATA CAGTTTGTGT TACCCAGGCA
CTACATTGTA GAAACACCAA ACTCTGAGAA TGGAATATGT TATCCAGGAG
CTACATCGTC GAAAGATCAT CAGCTGTAAA TGGAACGTGT TACCCTGGGA
CTTTTTGTTG AACGCAGCAA AGCCT----A CAGCAACTGT TACCCTTATG
CAGTCATTCA ATTCAGACAG GGAATCAACA CCAAGCTGAA CCATG-CAAT
TAGCCATTCA ATTCAAACTG GAAGTCAAAA CCATACTGGA ATATG-CAAC
TTGCATTTTA AGCAACAT-- GAGTGCGACT CCCCCGCGAG CAACC-AAGT
TTGCATTTCA AGCAATAT-- GAATTCAACT CCCCCCCAAA CAACC-AAGT
CTACATTTCA ------AT-- GAATGTACCA ACCCATCGAA CAATC-AAGC
GATTTCCTCA AAGACGTGAT GGAATCAATG GATAAGGGAG AAATGGAAAT
GATTTCCTCA AGGACGTAAT GGAATCAATG GATAAGGAAG AAATGGAAAT
GATTTCCTCA AGGATGTGAT AGAATCAATG GATAAAGAGG AGATGGAAAT
GATTTCCTCA AGGATGTGAT GGAATCAATG GATAAAGAGG AAATGGAGAT
GACTTCCTTA AGGATGTAAT GGAGTCAATG AAAAAAGAAG AAATGGGGAT

GCAGG----- AAATGGCCAG TAGGGGTTTA TGGGATTCCT TTCGTC-AGT


ACAGG----- AAATGGCAAG CAGGGGTCTA TGGGATTCCT TTCGTC-AGT
ACAAG----- AAATGGCCAA CAGAGGCCTC TGGGATTCCT TTCGTC-AGT
ACAAG----- AAATGGCCAG CAGAGGCCTC TGGGATTCCT TTCGTC-AGT
ACAAG----- AAATGGCCAA CAGAGGCCTC TGGGATTCCT TTCGTC-AGT
GTTGA-TCCA GAACAGTATA ACAATAGAGA GAATGGTTCT CTCCGC-ATT
GTTGA-TCCA GAACAGTATA ACAATAGAGA GAATGGTTCT CTCCGC-ATT
GCTGA-TTCA GAACAGCATA ACAATAGAGA GAATGGTTCT CTCTGC-ATT
ACTGA-TCCA GAACAGCTTA ACAATAGAGA GAATGGTGCT CTCTGC-TTT
GTTGA-TCCA GAACAGCTTG ACAATAGAGA AAATGGTGCT CTCTGC-TTT
GTTGA-TCCA AAACAGCTTA ACAATAGAGA GAATGGTGCT CTCTGC-TTT
AAGAGCTCCA GGATTGTAAA ATTGCTCCTT TAATGGTGGC ATACAT-GTT
AGGAACTCAA GAATTGTAAT ATTGCTCCTT TAATGGTGGC ATATAT-GTT
AAGAACTCCA AGATTGCAAA ATTTCTCCTT TGATGGTTGC ATACAT-GTT
AAGAACTCCG AGATTGCAAA ATTTCTCCCT TGATGGTTGC ATACAT-GTT
AAGAACTCCA GGATTGCAAA ATTTCTCCTT TGATGGTTGC ATACAT-GTT
ATGGATTCTA ACACTGTGTC AAGTTTTCAG GTAGATTGCT TCCTTT-GGC
ATGGATTCCA ACACTGTGTC AAGTTTCCAG GTAGATTGCT TTCTTT-GGC
ATGGATCCAA ACACTGTGTC AAGCTTTCAG GTAGATTGCT TTCTTT-GGC
ATGGATTCCA ACACTGTGTC AAGCTTTCAG GTAGACTGCT TTCTTT-GGC

38
ATGGATTCCA ACACGATAAC CTCGTTTCAG GTAGATTGTT ATCTAT-GGC
---------- ---------- ---------A GCAAAAGCAG GTAGAT-ATT
---------- ---------- --GGGGAATT CCAAAAGCAG GTAGAT-ATT
---------- ---------- ---------A GCAAAAGCAG GTAGAT-ATT
---------- ---------- ---------A GCAAAAGCAG GTAGAT-ATT
---------- ---------- ---------A GCGAAAGCAG GTAGAT-ATT
ATTTC---AA CGACTATGAA GAACTGAAAC ACCTATTGAG CAGAAC-AAA
GCTTC---AA TGACTATGAA GAATTGAAAC ATCTCCTCAG CAGCGT-GAA
ATTTC---AT CGACTATGAG GAGCTGAGGG AGCAATTGAG CTCAGT-GTC
ATGTA---GA AAACCTAGAG GAACTCAGGA CACTTTTTAG TTCCGC-TAG
ATGTG---CC GGATTATGCC TCCCTTAGGT CACTAGTTGC CTCATC-CGG
CAAAGCATTA TTACTTATGA AAACAACACC TGGGTAAATC A------AAC
CAAAACATCA TTACCTATAA AAATAGCACC TGGGTAAAGG A------CAC
AATGCCGTGT GAACCAATAA TAATAGAAAG -GAACATAAC A------GAG
GATGCTGTGT GAACCAACAA TAATAGAAAG -AAACATAAC A------GAG
AGTGCCATGT GAACCAATCA TAATAGAAAG -GAACATAAC A------GAG
AATAACACAT TTCCAGAGAA AGAGAAGAGT GAGGGACAAC ATGACCAAGA
AACAACACAT TTCCAGAGAA AGAGAAGAGT GAGGGACAAC ATGACCAAGA
AACAACACAC TTCCAAAGAA AAAGAAGAGT AAGAGACAAC ATGACCAAGA
AACAACACAC TTTCAAAGAA AAAGGAGAGT AAGAGACAAC ATGACCAAGA
CACAACTCAT TTTCAGAGAA AGAGACGGGT GAGAGACAAT ATGACTAAGA

CCGAGAGAGG CGAAGAGAC- -AGTTGAAGA AAG-ATTTGA AATCACAGGG


CCGAGAGAGG CGAAGAGAC- -AATTGAAGA AAG-ATTTGA AATCACAGGG
CCGAAAGAGG CGAAGAAAC- -AATTGAAGA AAG-ATTTGA AATCACAGGG
CCGAGAGAGG AGAAGAGAC- -AATTGAAGA AAG-GTTTGA AATCACAGGA
CCGAAAGAGG CGAAGAAAC- -AATTGAAGA AAA-ATTTGA AATCTCAGGA
TGATGAAAGG AGGAACAGG- -TACCTAGAG G-------AA CATCCCAGTG
TGATGAAAGG AGGAACAGG- -TACCTAGAG G-------AA CATCCCAGTG
TGATGAAAGG AGGAACAAA- -TACCTGGAA G-------AA CATCCCAGTG
TGACGAGAGA AGGAATAAA- -TATCTGGAA G-------AA CATCCCAGCG
TGATGAAAGA AGGAATAAA- -TACCTGGAA G-------AA CACCCCAGCG
TGACGAAAGG AGAAATAAA- -TACCTGGAA G-------AA CATCCCAGTG
GGAAAGAGAA CTGGTCCGC- -AAAACCAGA TTT-C--TAC CGGTAGCAGG
GGAAAGAGAA TTGGTTCGC- -AAGACCAGA TTC-C--TAC CCGTGGCTGG
AGAGAGAGAA CTTGTCCGA- -AAAACGAGA TTT-C--TCC CAGTTGCTGG
AGAGAGAGAA CTTGTCCGA- -AAAACAAGA TTT-C--TCC CAGTTGCTGG
GGAGAGAGAA CTGGTCCGC- -AAAACGAGA TTC-C--TCC CAGTGGCTGG
ATGTCCGAAA ACAAGTTGT- -AGACCAAGA ACT-AGGTGA TGCCCCATTC
ATATCCGGAA ACAAGTTGT- -AGACCAAGA ACT-GAGTGA TGCCCCATTC
ATGTCCGCAA ACGAGTTGC- -AGACCAAGA ACT-AGGTGA TGCCCCATTC
ATGTCCGCAA ACGATTTGC- -AGACCAAGA ACT-GGGTGA TGCCCCATTC
ACATAAGAAA GCTACTCAG- -TATGAGAGA CAT-GTGTGA TGCCCCCTTT
GAAAAATGAG TCTTCTAACC GAGGTCGAAA CGT-ACGTTC TCTCTATCGT
GAAAGATGAG TCTTCTAACC GAGGTCGAAA CGT-ACGTTC TCTCTATCAT
GAAAGATGAG CCTTCTAACC GAGGTCGAAA CGT-ACGTTC TCTCTATCGT
GAAAGATGAG CCTTCTAACC GAGGTCGAAA CGT-ATGTTC TCTCTATCGT
GAAAGATGAG TCTTCTAACC GAGGTCGAAA CGT-ACGTTC TCTCTATCAT
CCATTTTGAG AAAATTCAGA TCATCCCCAA A---AGTTCT TGGTCCAATC
ACATTTTGAG AAAGTTAAGA TTTTGCCCAA A---GATAGA TGGAC---AC
ATCATTCGAA AGATTCGAAA TATTTCCCAA AGAAAGCTCA TGGCCCAACC
TTCCTACCAA AGAATCCAAA TCTTCCCAGA C---ACAACC TGGA------
CACACTGGAG TTTAACAATG AAAGCTTCAA T--------- TGGAC---TG
ATATGTCAAC ATCAGCAATA CCAATTTTCT TACTG--AAA AAGCTGTGGC
A--------- ---------- ---------- ---------- --------AC
ATA-GTGTAT TTGAATAACA CCACCATAGA GAAAG--AGA TCTGCCCCGA
ATA-GTGTAT CTGACCAACA CCACCATAGA GAAGG--AAA TGTGCCCCAA
ATA-GTGCAT TTGAATAATA CTACCATAGA GAAGG--AAA GTTGTCCTAA
AAATGGTCAC ACAAAGAACA ATAGGGAAGA AAAAACAAAG GCTGAACAAA
AAATGGTCAC ACAAAGAACA ATAGGGAAGA AGAAGCAAAA GCTGACAAAA
AAATGGTCAC ACAACGAACA ATAGGAAAGA AGAAGCAAAG ATTGAACAAG
AAATGGTCAC ACAAAGAACA ATAGGGAAGA AAAAACAAAG AGTGAATAAG
AAATGATAAC ACAGAGAACA ATAGGTAAAA GGAAACAGAG ATTGAACAAA

ACTATGTGCA GGCTTGCCGA CCAAAGTCTC CCACCTAATT TCTCCAGCCT


ACCATGCGTA GGCTTGCCGA CCAAAGTCTC CCACCTAACT TCTCCAGCCT
ACAATGCGCA GGCTTGCCGA CCAAAGTCTC CCGCCGAACT TCTCCTGCCT
ACAATGCGCA AGCTTGCCGA CCAAAGTCTC CCGCCGAACT TCTCCAGCCT
ACTATGCGTA GGCTTGCCGA CCAAAGTCTC CCACCGAAAT TCTCCTGCCT
CGGGGAAGGA CCCGAAGAAG ACCGGAGGTC CAATCTACCG AAGGAGAGAC
CGGGGAAGGA CCCGAAGAAG ACCGGAGGTC CAATCTACCG AAGGAGAGAC
CGGGGAAGGA CCCAAAGAAA ACTGGAGGTC CAATCTACCG AAGAAGAGAC
CGGGGAAGGA TCCTAAGAAA ACTGGAGGAC CCATATACAA GAGAGTAGAT
CGGGGAAAGA TCCCAAGAAA ACTGGGGGGC CCATATACAG GAGAGTAGAT
CGGGGAAGGA TCCTAAGAAA ACTGGAGGAC CTATATACAG AAGAGTAAAC
CGGAACAAGC AGTGTGTACA TTGAGGTATT GCATTTGACT CAAGGGACCT
CGGGACAAGC AGCGTATATA TAGAAGTATT GCATTTGACT CAAGGAACTT
TGGAACAAGC AGTGTGTACA TTGAAGTGTT ACACTTGACT CAAGGAACAT
CGGAACAAGC AGTATATACA TTGAAGTCTT ACATTTGACT CAAGGAACGT
TGGAACAAGC AGTGTGTACA TTGAAGTGTT GCATTTGACT CAAGGAACAT
CTTGATCGGC TTCGCCGAGA TCAGAAGTCC CTAAGGGGAA GAGGCAGCAC
CTTGATCGGC TTCGCCGAGA TCAGAGGTCC CTAAGGGGAA GAGGCAATAC
CTTGATCGGC TTCGCCGAGA TCAGAAATCC CTAAGAGGAA GGGGCAGCAC
CTTGACCGGC TTCGCCGAGA TCAGAAGTCC CTAAGAGGAA GAGGCAGCAC
GATGACAGGC TCCGAAGAGA CCAAAAGGCA TTAAAGGGAA GAGGCAGCAC

39
CCCGTCAGGC CCCCTCAAAG CCGAGATCGC GCAGAGACTT GAGGATGTCT
CCCATCAGGC CCCCTCAAAG CCGAGATCGC GCAGAGACTT GAGGATGTTT
CCCGTCAGGC CCCCTCAAAG CCGAGATCGC ACAGAGACTT GAAGATGTCT
TCCATCAGGC CCCCTCAAAG CCGAGATCGC GCAGAGACTT GAAGATGTCT
CCCGTCAGGC CCCCTCAAAG CCGAGATCGC ACAGAGACTT GAAGATGTCT
ATGATGCCTC ATCAGGGGTG AGCTCAGCAT GTCCATACCA TGGGAGGTCC
AGCATACAAC AACTGGAGGT TCATGGGCCT GCGCGGTGTC AGGTAAACCA
ACAACACAAC CAAAGGAGTA ACGGCAGCAT GCTCCCATGC GGGGAAAAGC
ATGTGACTTA CACTGGAACA AGCAGAGCAT GTTC------ ------AGGT
GAGTCACTCA AAATGGAACA AGCTCTGCTT GCAAAAGGAG ATCTAATAAC
TTCAGTAA-- -CATTAGCGG GCAATTCA-- -TCTCTTTGC CCCAT----T
TTCAGTGA-- -TATTAACCG GCAATTCA-- -TCTCTTTGT CCCAT----C
AGTAGTGG-- -AATACAGAA ATTGGTCAAA GCCGCAATGT CAAAT----T
ACTAGCAG-- -AATACAGAA ATTGGTCAAA GCCGCAATGT GACAT----T
AGTAGCAG-- -AATACAAGA ATTGGTCAAA ACCGCAATGT CAAAT----T
AGGAGCTACC TAATAAGAGC ACTGACACTG AACACAATGA CAAAAGACGC
AAGAGCTACC TAATAAGAGC ACTGACACTG AACACAATGA CAAAAGATGC
AGAAGCTATC TGATAAGAGC ACTGACATTG AACACAATGA CTAAAGATGC
AGAGGCTATC TAATAAGAGC TTTGACATTG AACACGATGA CCAAAGATGC
AGGAGTTATC TAATTAGAGC ATTGACCCTG AACACAATGA CCAAAGATGC

TGAAAAATTT AGAGC-CTAT GTGGATGGAT TCGAACCGAA CGGCTGCATT


TGAAAACTTT AGAGC-CTAT GTGGATGGAT TCAAACCGAA CGGCTGCATT
TGAGAATTTT AGAGC-CTAT GTGGATGGAT TCGAACCGAA CGGCTACATT
TGAAAATTTT AGAGC-CTAT GTGGATGGAT TCGAACCGAA CGGCTACATT
TGAGAATTTT AGAGC-CTAT GTGGATGGAT TCGAACCGAA CGGCTGCATT
G--GGAAATG GGTGAGAGAG CTGATTCTGT ATGACAAAGA GGAGA----T
G--GGAAATG GGTGAGAGAG CTGATTCTGT ATGACAAAGA GGAGA----T
G--GAAAATG GGTGAGAGAG CTGATTCTGT ATGACAAAGA GGAGA----T
G--GAAAGTG GATGAGGGAA CTCGTCCTTT ATGACAAAGA AGAAA----T
G--GAAAATG GATGAGGGAA CTCGTCCTTT ATGACAAAGA AGAGA----T
G--GAAAGTG GATGAGAGAA CTCATCCTTT ATGACAAAGA AGAAA----T
GTTGGGAACA GATGTACACT CCCGGCGGAG AAGTAAGAAA TGATGATGTT
GCTGGGAGCA GATGTACACA CCAGGAGGGG AGGTAAGAAA TGATGATGTT
GTTGGGAACA GATGTACACC CCAGGTGGAG AAGTGAGGAA TGATGATGTT
GTTGGGAACA AATGTACACT CCAGGTGGAG AAGTGAGGAA TGACGATGTT
GCTGGGAACA GATGTATACT CCAGGAGGGG AAGTGAAGAA TGATGATGTT
TCTCGATCTA GACAT-CGAA GCAGCCACCC GTGTTGGAAA GCAGATAGTA
TCTCGGTCTA GACAT-CAAA GCAGCCACCC ATGTTGGAAA GCAAATTGTA
TCTTGGTCTG GACAT-CGAG ACAGCCACAC GTGCTGGAAA GCAGATAGTG
TCTTGGTCTG GACAT-CAGA ACTGCCACTC GTGAAGGAAA GCATATAGTG
ACTTGGACTC GATTT-AAGA GTGGCTACAA TGGAGGGGAA AAAGATCGTT
TTGCAGGAAA GAACACCGAT CTCGA-GGCT CTCATGGAAT GGCTAAAGAC
TTGCAGGGAA GAACACAGAT CTTGA-GGCT CTCATGGAAT GGCTAAAGAC
TTGCTGGGAA GAACACAGAT CTTGA-GGCT CTCATGGAAT GGCTAAAGAC
TTGCTGGGAA AAACACAGAT CTTGA-GGCT CTCATGGAAT GGCTAAAGAC
TTGCAGGGAA GAACACCGAT CTTGA-GGTT CTCATGGAAT GGCTAAAGAC
TCCTTTTTCA GAAATGTGGT ATGGCTTAT- CAAAAAGAAC AGTGCATACC
TCATTCTTCA GGAACATGGT CTGGCTGACA CGTAAAGGAT CAAAT-TATC
AGTTTTTACA GAAATTTGCT ATGGCTGAC- GGAGAAGGAG GGCTCATACC
TCATTCTACA GGAGTATGAG ATGGCTGAC- TCAAAAGAGC GGTTTTTACC
AGTTTCTTTA GTAGATTGAA TTGGTTGACC CACTTAAAAT T-CAAATACC
AGCGGATGGG CTGTACACAG TAAGGACAAC GGTATAAGAA TCGGTTCCAA
CGTGGGTGGG CTATATACAG CAAAGACAAT AGCATAAGAA TTGGTTCCAA
ACAGGATTTG CACCTTTTTC TAAGGACAAT TCAATCCGGC TTTCTGCTGG
ACAGGATTTG CACCTTTTTC TAAGGACAAT TCGATTAGGC TTTCCGCTGG
ACAGGGTTCG CCCCTTTCTC CAAGGACAAC TCAATTAGGC TTTCTGCAGG
AGAAAGAGGC AAATTGAAGA GGCGGGCAAT TGCAACACCC GGGATGCAAA
TGAAAGGGGA AAATTGAAAA GACGAGCGAT TGCAACACCC GGAATGCAAA
AGAGAGAGGT AAATTAAAAA GAAGAGCAAT TGCAACACCC GGTATGCAGA
AGAGAGAGGT AAATTAAAAA GAAGGGCTAT TGCAACACCC GGGATGCAAA
TGAGAGAGGG AAGCTAAAAC GGAGAGCAAT TGCAACCCCA GGGATGCAAA

GAGGGCAAGC TTTCTCAAAT GTCGAAAGAA GTAAACGCCA GAATTGAGCC


GAGGGCAAGC TTTCTCAAAT GTCGAAAGAA GTGAACGCCA GAATTGAGCC
GAGGGCAAGC TTTCTCAAAT GTCCAAAGAA GTAAATGCAA AAATTGAACC
GAGGGCAAGC TGTCTCAAAT GTCCAAAGAA GTAAATGCTA GAATTGAACC
GAGGGCAAGC TTTCTCAAAT GTCCAAAGAA GTGAATGCCA AAATTGAACC
AAGGAGAATT TGGCG--TCA AGCGAACAAT GGAGAAGACG CAACTGCTGG
AAGGAGAATT TGGCG--TCA AGCGAACAAT GGAGAAGACG CAACTGCTGG
CAGGAGAATT TGGCG--TCA AGCGAACAAT GGAGAAGATG CAACTGCTGG
AAGGCGAATC TGGCG--CCA AGCCAATAAT GGTGATGATG CAACAGCTGG
AAGGCGAATC TGGCG--CCA AGCCAACAAT GGTGAGGATG CGACAGCTGG
AAGGCGAATC TGGCG--CCA AGCTAATAAT GGTGACGATG CAACGGCTGG
GACCAGAGTT TGATCATCGC TGCCAGAAAC ATTGTTAGGA GAGCAACAGT
GACCAAAGTT TAATCATTGC TGCTAGGAAC ATTGTCAGGA GAGCAACAGT
GATCAAAGTC TAATTATTGC AGCCAGGAAC ATAGTGAGAA GAGCAGCAGT
GACCAAAGCC TAATTATTGC GGCCAGGAAC ATAGTAAGAA GAGCTGCAGT
GATCAAAGCT TGATTATTGC TGCTAGGAAC ATAGTGAGAA GAGCTGCAGT
GAGAGGATTC TGAAGGAAGA ATCCGATGAG GCACTTAAAA TGACCATGGC
GAAAAGATTC TGAAAGAAGA ATCTGATGAG GCACTTAAAA TGACCATGGT
GAGCGGATTC TGAAAGAAGA ATCCGATGAG GCACTTAAAA TGACCATGGC
GAGCGGATTC TGGAGGAAGA ATCTGACGAG GCACTTAAAA TGACTATCGC
GAGGACATCC TGAAGAGTGA GACAAATGAA AACCTCAAAA TAGCCATTGC
AAGACCAATC CTGTC---AC CTCTGACTAA AGGGATTTTA GGATT--TGT

40
AAGACCAATC CTGTC---AC CTCTGACTAA GGGGATTTTA GGGTT--TGT
AAGACCAATC CTGTC---AC CTCTGACTAA GGGGATTTTG GGATT--TGT
AAGACCAATT CTGTC---AC CTCTGACTAA GGGGATTTTG GGGTT--TGT
AAGACCAATC CTGTC---AC CTCTGACTAA GGGGATTTTA GGATT--TGT
CAACAATAAA GAGGAGCTAC AATAATACCA ACCAAGAAGA TCTTTTAGTA
CGGTTGCCAA AGGATCGTAC AACAATACAA GCGGAGAACA AATGCTAATA
CAAAGCTGAA AAATTCTTAT GTGAACAAGA AAGGGAAAGA AGTCCTTGTA
CTGTTCAAGA CGCCCAATAC ACAAATAACA GGGGAAAGAG CATTCTTTTC
CAGCATTGAA CGTGACTATG CCAAACAATG AAAAATTTGA CAAACTGTAC
GGGGGATGTG TTTGTTATAA GA-GAGCCGT TCATCTCATG CTCCCACTTG
AGGAGACGTT TTTGTCATAA GA-GAGCCCT TTATTTCATG TTCTCACTTG
TGGGGACATT TGGGTGACGA GA-GAACCTT ATGTGTCATG CGATCCTGGC
TGGGGACATC TGGGTGACAA GA-GAACCTT ATGTGTCATG CGACCCTGAC
CGGGGATATT TGGGTGACAA GA-GAACCTT ATGTATCGTG CGGTCTTGGT
TCAGAGGATT CGTGTACTTT GTCGAAACAC TAGCGAGGAG TATCTGTGAG
TCAGAGGATT CGTGCACTTT GTCGAAGCAC TAGCAAGGAG CATCTGTGAA
TCAGAGGGTT CGTGCACTTT GTCGAAACAC TAGCGAGAAA TATTTGTGAG
TTAGAGGGTT CGTGTACTTC GTTGAAACTT TAGCTAGAAG CATTTGCGAA
TAAGGGGGTT TGTATACTTT GTTGAGACAC TGGCAAGGAG TATATGTGAG

ATTTCTGAAG ACAACA---C CACGCCCTCT TAGATTACCT GATGGG---C


ATTTCTGAAG ACAACA---C CACGTCCCCT CAGATTGCCT GATGGA---C
TTTTCTGAAA ACAACA---C CAAGACCAAT TAGACTTCCG GATGGG---C
TTTTTTGAAA ACAACA---C CACGACCACT TAGACTTCCG AATGGG---C
TTTTCTGAAG ACAACA---C CAAGACCAAT CAAACTTCCT AATGGA---C
TCTCACTCAT ATGATG---A TCTGGCATTC CAACCTAAAT GATGCC--AC
TCTCACTCAT ATGATG---A TCTGGCATTC CAACCTAAAT GATGCC--AC
TCTCACTCAC ATGATG---A TCTGGCATTC CAATCTAAAT GATGCC--AC
GCTGACTCAC ATGATG---A TCTGGCATTC CAATTTGAAT GATACA--AC
TCTAACTCAC ATAATG---A TCTGGCATTC CAATTTGAAT GATGCA--AC
TCTGACTCAC ATGATG---A TCTGGCATTC CAATTTGAAT GATGCA--AC
ATCAGCGGAC CCACTG---G CATCACTCTT GGAGATGTGT CACAGCACAC
ATCAGCAGAC CCATTG---G CTTCACTCCT GGAAATGTGC CATAGCACAC
ATCAGCAGAT CCACTA---G CATCTTTATT GGAGATGTGC CACAGCACAC
ATCAGCAGAT CCACTA---G CATCTTTATT GGAGATGTGC CACAGCACAC
ATCAGCAGAC CCACTA---G CATCTTTATT GGAGATGTGC CACAGCACAC
CTCCGCACCT GCTTCG---C GATACCTAAC TGACATG-AC TATTGAGGAA
CTCCACACCT GCTTCG---C GATACATAAC TGACATG-AC TATTGAGGAA
CTCTGTACCT GCGTCG---C GTTACCTAAC CGACATG-AC TCTTGAGGAA
TTCAGTGCCT GCTTCA---C GCTACCTAAC TGAAATG-AC TCTTGAGGAA
TTCCAGTCCT GCTCCT---C GGTATATCAC CGATATG-AG CATAGAGGAG
GTTCACGCTC ACCGTG---C CCAGTGAGCG AGGACTGCAG CGTAGA----
GTTCACGCTC ACCGTG---C CCAGTGAGCG AGGACTGCAG CGTAGA----
ATTCACGCTC ACCGTG---C CAAGTGAGCG AGGACTGCAG CGTAGA----
GTTCACGCTC ACCGTG---C CCAGTGAGCG AGGACTGCAG CGTAGA----
GTTCACGCTC ACCGTG---C CCAGTGAGCG AGGACTGCAG CGTAGA----
CTGTGGGGGA TTCACC---A TCCTAATGAT GCGGCAGAGC AGACAAAGCT
ATTTGGGGAG TGCACC---A TCCTAATGAT GAGGCAGAAC AAAGAGCATT
CTGTGGGGTA TTCATC---A CCCGTCTAAC AGTAAGGATC AACAGAATAT
GTGTGGGGCA TACATC---A CCCACCCACC TATACCGAGC AAACAAATTT
ATTTGGGGGG TTCACC---A CCCGGGTACG GACAATGACC AAATCAGCCT
GAATGCAGAA CTTTCT---T TTTGACTCAG GGAGCCTTGC TGAATGACAA
GAATGCAGGA CCTTTT---T TCTGACCCAA GGTGCCTTAC TGAATGACAG
AAGTGTTATC AATTTG---C ACTCGGGCAG GGGACCACAC TAGACAACAA
AAGTGTTACC AATTTG---C CCTTGGACAG GGAACAACAC TAAACAACGT
AAATGTTACC AATTTG---C ACTTGGGCAG GGAACCACTT TGAACAACAA
AAACTTGAGC AATCTGGACT CCCCGTCGGA GGGAATGAAA AGAAGGCTAA
AAACTTGAGC AATCTGGACT CCCCGTTGGA GGGAATGAGA AGAAGGCTAA
AAACTTGAAC AGTCTGGGCT TCCGGTTGGA GGTAATGAAA AGAAGGCTAA
AAGCTTGAAC AGTCTGGACT TCCGGTTGGG GGTAATGAAA AGAAGGCCAA
AAACTTGAAC AATCAGGGTT GCCAGTTGGA GGCAATGAGA AGAAAGCAAA

CTCCCTGCTC TCAGCGGTCG AAGTT--TTT GCTGATGGAT GCCCTTA---


CTCCCTGCTC CCAGCGGTCG AAATT--CTT GCTGATGGAT GCTCTGA---
CTCCTTGTTT TCAGCGGTCC AAATT--CCT GCTGATGGAT GCTTTAA---
CTCCCTGTTC TCAGCGGTCC AAATT--CCT GCTGATGGAT GCCTTAA---
CTCCTTGTTA TCAGCGGTCC AAATT--CCT CCTGATGGAT GCTTTGA---
ATACCAGAGA ACAAGAGCCC TCGTG--CGG ACTGGAATGG ACCCCAG--A
ATACCAGAGA ACAAGAGCCC TCGTG--CGG ACTGGAATGG ACCCCAG--A
ATACCAGAGA ACAAGAGCTC TCGTG--CGT ACTGGGATGG ACCCTAG--A
ATACCAGAGG ACAAGAGCTC TTGTT--CGC ACCGGAATGG ATCCCAG--G
ATACCAGAGG ACAAGAGCTC TTGTT--CGA ACTGGAATGG ATCCCAG--A
TTATCAGAGG ACAAGGGCTC TTGTT--CGC ACCGGAATGG ATCCCAG--G
AAATTGGGGG AATAAGGATG GTGGA--CAT CCTTAGGCAA AACCCAACTG
AAATTGGCGG AGTAAGAATG GTAGA--CAT CCTTAAACAA AACCCAACAG
AGATTGGCGG GACAAGGATG GTGGA--CAT TCTTAGGCAG AACCCAACGG
AAATTGGCGG GACAAGGATG GTGGA--CAT TCTTAGACAG AACCCGACTG
AGATTGGTGG AATTAGGATG GTAGA--CAT CCTTAAGCAG AACCCAACAG
TTGTCAAGGG ACT--GGTTC ATGCT--AAT GCCCAAGCAG AAAGTGGAAG
TTGTCAAGAA ACT--GGTTC ATGCT--AAT GCCCAAGCAG AAAGTGGAAG
ATGTCAAGGG AAT--GGTCC ATGCT--CAT ACCCAAGCAG AAAGTGGCAG
ATGTCAAGGG ATT--GGTTA ATGCT--CAT TCCCAAGCAG AAAGTGACAG
ATGAGCCGAG AAT--GGTAC ATGCT--GAT GCCTAGGCAG AAAATAACTG
CGCTTTGTCC AGAATGCCTT AAATG--GAA ATGGAGATCC AAACAATATG
CGATTTGTCC AAAATGCCCT AAATG--GGA ATGGAGACCC AAACAACATG

41
CGCTTTGTCC AAAATGCCCT CAATG--GGA ATGGGGATCC AAATAACATG
CGCTTTGTCC AAAATGCCCT CAATG--GGA ATGGAGATCC AAATAACATG
CGCTTTGTCC AAAATGCCCT TAATG--GGA ACGGGGATCC AAATAACATG
CTATCAAAAC CCAACCACTT ACATTTCCGT TGGAACATCA ACACTGAACC
GTACCAGAAT GTGGGAACCT ATGTTTCCGT AGCCACATCA ACATTGTACA
CTATCAGAAT GAAAATGCTT ATGTCTCTGT AGTGACTTCA AATTATAACA
GTACATAAGA AACGACACAA CAACAAGCGT GACAACAGAA GATTTGAATA
ATATGCTCAA GCATCAGGAA GAATCACAGT CTCTACCAAA AGAAGCCAAC
GCACTCCAAT GGGACCGTCA AAGACAGAAG CC-CTCACAG AACATTGA--
GCATTCAAAT GGGACTGTTA AGGACAGAAG CC-CTTATAG GGCCTTAA--
ACATTCAAAT GACACAATAC ATGATAGAAT CC-CTCATCG AACCCTAT--
GCATTCAAAT GACACAGTAC ATGATAGGAC CC-CTTATCG GACCCTAT--
ACACTCAAAT GGCACAATAC ATGATAGGAG TC-CCCATAG AACCCTTT--
ATTGGCAAAT GTCGTGAGGA AGATGATGAC TAACTCACAA GATACAGAGC
ATTGGCAAAT GTTGTGAGAA AGATGATGAC TAACTCACAA GACACAGAGC
ACTAGCAAAT GTTGTTAGAA AAATGATGAC TAATTCACAA GACACAGAGC
ACTGGCAAAT GTTGTGAGAA AAATGATGAC TAATTCACAA GACACTGAGC
GTTGGCAAAT GTTGTAAGGA AGATGATGAC CAATTCTCAG GACACCGAAC

AATTAAGCAT CGAAGACCCG AGTCATGAGG GGGAGGGGAT -ACCGCTATA


AATTAAGCAT TGAGGACCCG AGCCATGAGG GGGAGGGGAT -ACCGCTATA
AATTAAGCAT TGAGGACCCA AGTCACGAAG GGGAGGGAAT -ACCACTATA
AATTAAGCAT TGAGGACCCA AGTCATGAAG GAGAGGGAAT -ACCGCTATA
AATTGAGCAT TGAAGACCCA AGTCATGAAG GAGAAGGGAT -TCCATTATA
ATGTGCTCTC TGATGCAAGG ATCAACCCTC CCGAGGAGAT CTGGAGCTGC
ATGTGCTCTC TGATGCAAGG ATCAACCCTC CCGAGGAGAT CTGGAGCTGC
ATGTGCTCTC TGATGCAAGG ATCAACTCTC CCGAGGAGAT CTGGAGCTGC
ATGTGCTCTT TGATGCAGGG TTCGACTCTC CCTAGGAGGT CTGGAGCTGC
ATGTGCTCTC TGATGCAGGG CTCGACTCTC CCTAGAAGGT CCGGAGCTGC
ATGTGCTCTC TGATGCAAGG TTCAACTCTC CCTAGGAGGT CTGGAGCCGC
AGGAGCAAGC TGTGGATATA TGCAAAGCAG CAATGGGTTT GAGGATCAGT
AAGAGCAAGC TGTAGATATA TGCAAGGCAG CAATGGGTTT GAAAATCAGC
AAGAACAAGC TGTGGATATA TGCAAGGCTG CAATGGGACT GAGAATCAGC
AAGAACAAGC TGTGGATATA TGCAAGGCTG CAATGGGATT GAGAATCAGC
AAGAGCAAGC CGTGGGTATA TGCAAGGCTG CAATGGGACT GAGAATTAGC
GCCCTCTTTG CATCAGAATA GACCAGGCAA TCATGGATAA GAACATCATG
GACCTCTTTG CATCAGAATG GACCAGGCAA TCATGGAGAA AAACATCATG
GCCCTCTTTG TATCAGAATG GACCAGGCGA TCATGGATAA AAACATCATA
GGCCCCTTTG CATTAGAATG GACCAGGCAG TAATGGGTAA AACCATCATA
GAGGCCTTAT GGTGAAAATG GACCAAGCCA TAATGGATAA AAGAATTATC
GATAGGGCAG TTAAGCTATA CAAGAAGCTG AAAAGAGAAA TAACATTCCA
GACAGGGCAG TTAAACTATA CAAGAAGCTG AAGAGGGAAA TGACATTCCA
GACAGAGCAG TTAAACTGTA TAGAAAGCTT AAGAGGGAGA TAACATTCCA
GACAAAGCAG TTAAACTGTA TAGGAAACTT AAGAGGGAGA TAACGTTCCA
GACAAAGCAG TTAAACTGTA TAGGAAGCTC AAGAGGGAGA TAACATTCCA
AGAGATTGGT TCCAGAAATA GCTACTAGAC CCAAAGTAAA CGGGCAAAGT
AAAGGTCAAT CCCAGAAATA GCAGCAAGGC CTAAAGTGAA TGGACTAGGA
GGAGATTTAC CCCGGAAATA GCAGAAAGAC CCAAAGTAAG AGATCAAGCT
GGACCTTCAA ACCAGTGATA GGGCCAAGGC CCCTTGTCAA TGGTCTGCAG
AAACCGTAAT CCCGAGTATC GGATCTAGAC CCAGGATAAG GGATGTCCCC
TGAGTTGTCC TGTGGGTGA- GGCTCCCTCC CCATATAACT CAAGGTTTGA
TGAGCTGCCC TGTCGGTGA- AGCTCCGTCC CCGTACAATT CAAGATTTGA
TAATGAATGA GTTGGGTG-- --TTCCATTT CATTTAGGAA CCAGGCAAGT
TGATGAATGA ATTAGGTG-- --TTCCATTT CATCTGGGGA CCAAGCAAGT
TAATGAACGA GTTGGGTG-- --TTCCATTT CATTTGGGAA CCAAACAAGT
TCTCTTTTAC AATTACTG-- GAGACAACAC CAAATGGAAT GAGAATCAGA
TCTCCTTTAC AGTTACCG-- GAGACAACAC CAAATGGAAT GAGAATCAGA
TCTCTTTCAC AATTACTG-- GAGACAACAC CAAATGGAAT GAGAATCAAA
TTTCTTTCAC AATCACTG-- GGGACAACAC TAAGTGGAAT GAAAATCAAA
TTTCTTTGAC CATCACTG-- GAGATAACAC CAAATGGAAC GAAAATCAGA

TGATGCAATC AAATGCATGA AAACATTTTT CGGCTGGAAA GAGCCCAACA


TGATGCGATA AAATGCATGA AAACATTCTT CGGCTGGAGA GAGCCCAACA
TGATGCGATC AAGTGCATGA GAACATTCTT TGGATGGAAA GAACCCTATA
TGATGCAATC AAATGCATGA GAACATTCTT TGGATGGAAG GAACCCAATG
TGATGCGATC AAGTGCATAA AAACATTCTT TGGATGGAAA GAACCTTATA
TGGTGCAGCA A--TAAAGGG AGTCGGGACA ATGGTAATGG AACTA--AT-
TGGTGCAGCA A--TAAAGGG AGTCGGGACA ATGGTAATGG AACTA--AT-
TGGTGCGGCA G--TAAAGGG AGTCGGAACG ATGGTGATGG AACTA--AT-
AGGCGCTGCA G--TCAAAGG AGTTGGGACA ATGGTGATGG AGTTG--AT-
AGGTGCTGCA G--TCAAAGG AATCGGGACA ATGGTGATGG AACTG--AT-
AGGTGCTGCA G--TCAAAGG AGTTGGAACA ATGGTGATGG AATTG--GT-
TCATCCTTTA GCTTTGGAGG CTTCACTTTC AAAAGAACAA ATGGATCAT-
TCATCCTTCA GCTTTGGAGG GTTCACTTTC AAAAGAACAA AGGGGTCTT-
TCATCCTTCA GTTTTGGCGG GTTCACATTT AAGAGAACAA GCGGGTCAT-
TCATCCTTCA GCTTTGGTGG GTTTACATTT AAAAGAACAA GCGGGTCAT-
TCATCCTTCA GTTTTGGTGG ATTCACATTT AAGAGAACAA GCGGATCAT-
TTGAAAGCGA ATTTCAGTGT GATTTTTGAC CGGCTAGAGA CCCTAATAT-
TTGAAAGCGA ATTTCAGTGT GATTTTTGAC CGACTAGAGA CCATAGTAT-
CTGAAAGCGA ACTTCAGTGT GATTTTTGAC CGGCTGGAGA CTCTAATAT-
TTGAAAGCAA ACTTTAGTGT GATTTTTAAT CGACTTGAAG CTCTGATAC-
CTTAAAGCAA ATTTCTCAGT TCTATTTGAT CAACTAGAGA CATTAGTCT-
TGGGGCTAAG GAGGTCGCAC -TCAGCTACT CAACCGGTGC ACTTGCCAGT
TGGAGCAAAG GAAGTTGCAC -TCAGTTACT CAACTGGTGC GCTTGCCAGT
TGGGGCCAAA GAAGTAGCGC -TCAGTTATT CTGCTGGTGC ACTTGCCAGT

42
TGGGGCCAAA GAAATAGCTC -TCAGTTATT CTGCTGGTGC ACTTGCCAGT
TGGGGCCAAA GAAATCTCAC -TCAGTTATT CTGCTGGTGC ACTTGCCAGT
GGAAGAATGG AGTTCTTCTG GACAATTTTA AAGCCGAATG ATGCCATCAA
CGTAGAATGG AATTCTCTTG GACCCTCTTG GATATGTGGG ACACCATAAA
GGGAGGATGA ACTATTACTG GACCTTGCTA AAACCCGGAG ACACAATAAT
GGAAGAATTG ATTATTATTG GTCGGTACTA AAACCAGGCC AAACATTGCG
AGCAGAATAA GCATCTATTG GACAATAGTA AAACCGGGAG ACATACTTTT
GTCTGTTGCT TGGTCGGCAA GTGCTTGCCA TGATGGCACC AGTTGGTTGA
ATCGGTTGCT TGGTCAGCAA GTGCATGTCA TGATGGCATG GGCTGGCTAA
GTGTGTAGCA TGGTCCAGCT CAAGTTGTCA CGATGGAAAA GCATGGTTGC
GTGCATAGCA TGGTCCAGCT CAAGTTGTCA CGATGGAAAA GCATGGCTGC
GTGCATAGCA TGGTCCAGCT CAAGCTGCCA TGATGGGAAG GCATGGTTAC
ACCCTCGGAT GTTTCTAGCA ATGATAACAT ACATCACAAG GAACCAACCT
ATCCTCGAAT ATTTCTAGCA ATGATAACAT ACATCACAAG GAACCAACCT
ATCCTCGAGT GTTTCTGGCG ATGATAACAT ACATCACAAG AAATCAACCT
ACCCTCGAAT GTTTTTGGCG ATGATTACAT ATATCACAAA AAATCAACCT
ATCCTCGGAT GTTTTTGGCC ATGATCACAT ATATGACCAG AAATCAGCCC

TTGTAAAACC ACATGAAA-- -AAGGCATAA ACCCCAATTA CCTCCTGGCT


TCATCAAGCC ACACGAGA-- -AGGGCATAA ATCCCAATTA TCTTCTGGCT
TTGTTAAACC ACACGAAA-- -AGGGAATAA ATCCAAATTA TCTGCTGTCA
TTGTTAAACC ACACGAAA-- -AGGGAATAA ATCCAAATTA TCTTCTGTCA
TAGTCAAACC ACACGAAA-- -AGGGAATAA ATTCAAATTA CCTGCTGTCA
TCGGATGATA AAGCGAGG-- -CATTAATGA CCGGAACTTC TGGAGAGGCG
TCGGATGATA AAGCGAGG-- -CATTAATGA CCGGAACTTC TGGAGAGGCG
TCGGATGATA AAGCGAGG-- -GATTAACGA TCGGAATTTC TGGAGAGGTG
CAGGATGATC AAACGTGG-- -GATCAATGA TCGGAACTTC TGGAGAGGTG
CAGAATGGTC AAACGGGG-- -GATCAACGA TCGAAATTTC TGGAGAGGTG
CAGGATGATC AAACGTGG-- -GATCAATGA TCGGAACTTC TGGAGGGGTG
CCGTCAAGAA GGAAGAGG-- -AAGTGCTTA CAGGCAACCT CCAAACATTG
CTGTCAAAAG AGAGGAAG-- -AAGTGCTTA CAGGCAACCT CCAAACATTG
CAATCAAGAG AGAGGAAG-- -AAGTGCTTA CGGGCAATCT CCAAACATTG
CAGTCAAAAA AGAGGAAG-- -AAGTGCTTA CAGGCAATCT CCAAACATTG
CAGTCAAGAG AGAGGAAG-- -AGGTGCTTA CGGGCAATCT TCAAACATTG
TACTAAGGGC TTTCACCG-- -AAGAGGGAG CAATTGTTGG CGAAATTTCA
TACTAAGGGC TTTCACCG-- -AAGAGGGAG CAATTGTTGG CGAAATCTCA
TGCTAAGGGC TTTCACCG-- -AAGAGGGAG CAATTGTTGG CGAAATTTCA
TACTTAGAGC GTTTACAG-- -ATGAAGGAG CAATAGTGGG CGAAATCTCA
CTCTGAGGGC ATTCACAG-- -AAAGTGGTG CTATTGTGGC TGAAATATTT
TGTATGGGTC TCATATAC-- -AACAGGATG GGAACGGTGA CC-ACAGAAG
TGCATGGGTC TCATATAC-- -AACCGGATG GGAACAGTGA CC-ACAGAAG
TGCATGGGCC TCATATAC-- -AACAGGATG GGGGCTGTGA CC-ACTGAAG
TGCATGGGCC TCATATAC-- -AATAGGATG GGGGCTGTAA CC-ACTGAAG
TGTATGGGCC TCATATAC-- -AACAGGATG GGGGCTGTGA CC-ACTGAAG
TTTCGAGAGT AATGGAAATT TCATTGCTCC AGAATATGCA TACAAAATTG
TTTTGAGAGC ACTGGTAATC TAGTTGCACC AGAGTATGGG TTCAAAATAT
ATTTGAGGCA AATGGAAATC TAATAGCACC AAGGTATGCT TTCGCACTGA
AGTACGATCC AATGGGAATC TAATTGCTCC ATGGTATGGA CACGTTCTTT
GATTAACAGC ACAGGGAATC TAATTGCTCC TCGGGGTTAC TTCAAAATAC
CAATTGGAAT TTCTGGCCCA GACAATGGGG CTGTGGCTGT ATTGAAATAC
CAATCGGAAT TTCAGGTCCA GATAATGGAG CAGTGGCTGT ATTAAAATAC
ATGTTTGTGT CACTGGGGAT GATAAAAATG CAACTGCTAG CTTCATTTAT
ATGTTTGTGT AACGGGGGAT GATAAAAATG CAACTGCTAG CTTCATTTAC
ATGTTTGTGT CACTGGGGAT GATAGAAATG CGACTGCTAG CATCATTTAT
GAATGGTTTA GAAATGTCTT AAGCATTGCT CCTATAATGT TCTCAAACAA
GAATGGTTTA GAAATGTCTT GAGCATTGCC CCTATAATGT TCTCAAATAA
GAATGGTTTA GAAACGTCCT GAGCATTGCA CCCATAATGT TCTCAAATAA
GAGTGGTTCA GAAACATCCT GAGCATCGCA CCAATAATGT TCTCAAACAA
GAATGGTTCA GAAATGTTCT AAGTATTGCT CCAATAATGT TCTCAAACAA

TGGAAGCAGG TGCTGGCAGA GCTCCAAGAT ATTGAAAACG AGGAGAAAAT


TGGAAGCAGG TGCTGGCAGA ACTCCAGGAT ATTGAAAATG AGGATAAAAT
TGGAAGCAAG TACTGGCGGA ACTGCAGGAC ATTGAGAATG AGGAGAAGAT
TGGAAGCAAG TACTGGCAGA ACTGCAGGAC ATTGAGAATG AGGAGAAAAT
TGGAAGCAAG TATTGTCAGA ATTGCAGGAC ATTGAAAATG AGGAGAAGAT
ATAATGGACG AAGAACAAGG ATTGCATATG AGAGAATGTG C---AACATC
ATAATGGACG AAGAACAAGG ATTGCATATG AGAGAATGTG C---AACATC
AAAATGGGCG AAGAACAAGA ATTGCATATG AGAGAATGTG C---AACATC
AGAATGGACG GAAAACAAGG AGTGCTTACG AGAGAATGTG C---AACATT
AGAATGGGCG GAAAACAAGA AGTGCTTATG AGAGAATGTG C---AACATT
AGAATGGACG AAAAACAAGA ATTGCTTATG AAAGAATGTG C---AACATT
AAAATAAAAG TACATGAGGG G-TATGAAGA ATTCACAATG G---TTGGGC
AAGATAAAAG TACATGAAGG A-TATGAGGA ATTCACAATG G---TTGGAC
AAAATAAGGG TGCATGAGGG G-TACGAGGA ATTCACAATG G---TGGGGA
AAGATAAGAG TACATGAGGG G-TATGAGGA GTTCACAATG G---TGGGGA
AAGATAAGAG TGCATGAGGG A-TATGAAGA GTTCACAATG G---TTGGGA
CCATTGCCTT CTCTTCCAGG A-CAT--ACT ATTGAGGATG T---CAAAAA
CCATTGCCTT CTTTTCCAGG A-CAT--ACT ATTGAGGATG T---CAAAAA
CCATTGCCTT CTCTTCCAGG A-CAT--ACT GCTGAGGATG T---CAAAAA
CCATTACCTT CCCTTCCAGG A-CAT--ACT GACGAGGATG T---CAAAAA
CCCATTCCCT CCGTACCAGG A-CAT--TTT ACAGAGGATG T---CAAAAA
TGGCTTTTGG CCTAGTGTGT GCCACTTGTG AGC-AGATTG C-----AGAT
TGGCTCTTGG CCTAGTATGT GCCACTTGTG AAC-AGATTG C-----TGAT
TGGCCTTTGC CGTGGTATGT GCAACCTGTG AAC-AGATTG C-----TGAC
TGGCATTTGG CCTGGTATGT GCAACATGTG AAC-AGATTG C-----TGAC

43
TGGCATTTGG CCTGGTATGT GCAACCTGTG AAC-AGATTG C-----TGAC
TCAAGAAAGG GGACTCAGCA ATTATGAAAA GTGAATTGGA ATATGGTAAC
CGAAAAGAGG TAGTTCAGGG ATCATGAAGA CAGAAGGAAC ACTTGAGAAC
GTAGAGGCTT TGGGTCCGGC ATCATCACCT CAAACGCATC AATGCATGAG
CAGGAGGGAG CCATGGAAGA ATCCTGAAGA CTGATTTAAA AGGTGGTAAT
GAAGTGGGAA AAGCTCA--- ATAATGAGAT CAGATGCACC CATTGGCAAA
AACGGCATAA TAACAGACAC TATCAAGAGT TG-GAGGAAC AACAT---AC
AACGGCATAA TAACTGAAAC CATAAAAAGT TG-GAGGAAG AAAAT---AT
GACGGGAGGC TTATGGACAG TATTGGTTCA TG-GTCTCAA AATAT---CC
AATGGGAGGC TTGTAGATAG TATTGTTTCA TG-GTCCAAA AAAAT---CC
GATGGGATGC TTACCGACAG TATTGGTTCA TG-GTCTAAG AACAT---CC
GATGGCAAGA TTAGGGAAAG GATACATGTT CGAAAGTAAG AGCATGAAGC
AATGGCGAGG TTAGGAAAAG GATACATGTT CGAGAGTAAG AGCATGAAGC
AATGGCTAGA CTAGGGAAAG GTTACATGTT CGAAAGCAAG AGCATGAAGC
AATGGCAAGA CTAGGAAAAG GATACATGTT CGAGAGTAAG AGAATGAAGC
AATGGCGAGA CTGGGAAAAG GGTATATGTT TGAGAGCAAG AGTATGAAAC

TCCAAAGACA AAGAACAT-- GAGGAAAACA AGCCAATTGA AGTGGGCACT


CCCAAAAACA AAGAACAT-- GAAGAAAACA AGCCAATTAA TGTGGGCACT
TCCAAGAACT AAAAACAT-- GAAGAAAACG AGTCAGCTAA AGTGGGCACT
TCCAAAGACT AAAAATAT-- GAAAAAAACA AGTCAGCTAA AGTGGGCACT
CCCAAGGACT AAAAACAT-- GAAGAAAACG AGTCAACTAA AGTGGGCTCT
CTCAAAGGGA AATTTCAA-- ACAGCAGCAC AAAGAGCAAT GATGGATCAG
CTCAAAGGGA AATTTCAA-- ACAGCAGCAC AAAGAGCAAT GATGGATCAG
CTCAAAGGGA AATTCCAA-- ACAGCAGCAC AAAGAGCAAT GATGGATCAG
CTCAAAGGAA AATTTCAA-- ACAGCTGCAC AAAGAGCAAT GATGGATCAA
CTTAAAGGAA AATTTCAA-- ACAGCTGCAC AAAGAGCAAT GGTGGATCAA
CTCAAAGGGA AATTTCAA-- ACTGCTGCAC AAAAAGCAAT GATGGATCAA
GGAGAGCAAC AGCTATCC-- TGAGGAAAGC AACTAG-AAG GCTGATTCAG
GAAGAGCAAC AGCCATTC-- TAAGAAAAGC AACCAG-AAG GATGATCCAA
AAAGGGCAAC AGCTATAC-- TCAGAAAAGC AACCAG-GAG ATTGGTTCAG
AAAGAGCAAC AGCTATAC-- TCAGAAAAGC AACCAG-AAG ATTGGTTCAG
GAAGAGCAAC AGCCATAC-- TCAGAAAAGC AACCAG-GAG ATTGATTCAG
TGCAATTGGG GTCCTCAT-- CGGAGGACTT GAATGG-AAT GATAACACAG
TGCAATTGGG GTCCTCAT-- CGGAGGACTT GAATGG-AAT GATAACACAG
TGCAGTTGGA GTCCTCAT-- CGGAGGACTT GAATGG-AAT GATAACACAG
TGCAATTGGG GTCCTCAT-- CGGAGGACTT GAATGG-AAT GATAACACAG
TGCAATTGGA ATCCTCAT-- CGGTGGACTT GAATGG-AAT GATAACTCAA
TCACAGCATC GGTCTCAC-- AGACAGAT-- -GGCAACTAC CACCAACCCA
GCCCAACATC GGTCCCAC-- AGGCAGAT-- -GGCGACTAC CACCAACCCA
TCCCAGCATA GGTCTCAC-- AGGCAAAT-- -GGTGACAAC AACCAATCCA
TCCCAGCACA GGTCTCAT-- AGGCAAAT-- -GGTGGCAAC AACCAATCCA
TCCCAGCATC GGTCTCAT-- AGGCAAAT-- -GGTGACAAC AACCAACCCA
TGCAACACCA AGTGTCAA-- ACTCCAATGG GGGCGATAAA CTCTAGTATG
TGTGAAACCA AATGCCAA-- ACTCCTTTGG GAGCAATAAA TACAACACTA
TGTAACACGA AGTGTCAA-- ACACCCCTGG GAGCTATAAA CAGCAGTCTC
TGTGTAGTGC AATGTCAG-- ACTGAAAAAG GTGGCTTAAA CAGTACATTG
TGCAATTCTG AATGCATC-- ACTCCAAATG GAAGCATTCC CAATGACAAA
TGAGAACTCA AGAGTCTG-- AATGTGCATG TGTAAATGGC TCTTGCTTTA
TGAGGACACA AGAGTCTG-- AATGTGCCTG TGTAAATGGT TCATGTTTTA
TCAGGACCCA GGAGTCGG-- AATGCGTTTG TATCAATGGG ACTTGCACAG
TCAGGACCCA GGAGTCAG-- AATGCGTTTG TATCAATGGA ACTTGTACAG
TCAGAACTCA GGAGTCAG-- AATGCGTTTG CATCAATGGA ACTTGTACAG
TACGGACACA AATACCAGCA GAAATGCTTG CAAGCATTG- ACTTGAAATA
TACGGACACA AATACCAGCA GAAATGCTTG CAAACATTG- ACTTGAAATA
TCCGAACACA AATACCAGCA GAAATGCTAG CAAGTATTG- ACCTGAAATA
TCCGAACACA AATACCCGCA GAAATGCTAG CAAGCATTG- ACCTGAAGTA
TTAGAACTCA AATACCTGCA GAAATGCTAG CAAGCATTG- ATTTGAAATA

TGGTGAGAAT ATGGCACCAG AGAAAGTAGA CTT--TGAGG ATTGCAAAGA


CGGGGAGAAT ATGGCACCGG AAAAATTGGA CTT--TGAGG ACTGCAAAGA
TGGTGAGAAC ATGGCACCAG AGAAGGTAGA CTT--TGACA ACTGTAGAGA
TGGTGAGAAC ATGGCACCAG AAAAGGTAGA CTT--TGACG ACTGTAAAGA
TGGTGAAAAC ATGGCACCAG AGAAAGTAGA CTT--TGACA ACTGCAGAGA
GTGCGAGAAA GCAGAAATCC TGGGAATGCT GA---AATTG AAGATCTCAT
GTGCGAGAAA GCAGAAATCC TGGGAATGCT GA---AATTG AAGATCTCAT
GTACGGGAAA GCAGAAATCC TGGGAATGCT GA---GATTG AAGATCTCAT
GTGAGAGAAA GCCGGAACCC AGGAAATGCT GA---GATCG AAGATCTAAT
GTGAGAGAAA GTCGGAACCC AGGAAATGCT GA---GATCG AAGATCTCAT
GTGAGAGAGA GCCGGGACCC AGGGAATGCT GA---GTTCG AAGATCTCAC
TTGATAGTAA GTGGAAG-AG ATGAACAATC AAT--CGCTG AAGCGATCAT
CTGATAGTCA GCGGAAG-GG ACGAGCAATC AAT--TGCTG AGGCAATTAT
CTGATAGTGA GTGGAAG-AG ACGAACAGTC AAT--AGCCG AAGCAATAAT
CTCATAGTGA GTGGAAG-AG ACGAACAGTC AAT--AGCCG AAGCAATAAT
CTGATAGTGA GTGGGAG-AG ACGAACAGTC GAT--TGCCG AAGCAATAAT
TTCG-AGTCT CTAAAACTCT ACAGAGATTC GCT--TGGAG AAGCAGTAAT
TTCG-AGTCT CTAAAAATCT ACAGAGATTC GCT--TGGAG AAGCAGTAAT
TTCG-AGTCT CTGAAACTCT ACAGAGATTC GCT--TGGAG AAGCAGTAAT
TTCG-AGTCT CTGAAACTCT ACAGAGATTC ACT--TGGAG AAGCAGTGAT
TTCG-AGCGT CTGAAAATAT ACAGAGATTC GCT--TGGGG AATCCATGAT
CTAATCAGGC ATGAGAACAG AATGGTGCTG GCCAGCACTA CAGCTAAGGC
CTAATCAGGC ATGAGAACAG AATGGTACTA GCCAGCACTA CGGCTAAGGC
CTAATAAGAC ATGAGAACAG AATGGTTCTG GCCAGCACTA CAGCTAAGGC
TTAATAAAAC ATGAGAACAG AATGGTTTTG GCCAGCACTA CAGCTAAGGC
CTAATCAGAC ATGAGAACAG AATGGTTTTA GCCAGCACTA CAGCTAAGGC

44
CCATTCCACA ACATACACCC CCTCACCATC GGG--GAATG CCCCAAATAT
CCTTTTCACA ATGTCCACCC ACTGACAATA GGT--GAATG CCCCAAATAT
CCTTTCCAGA ATATACACCC AGTCACAATA GGA--GAGTG CCCAAAATAC
CCATTCCACA ATATCAGTAA ATATGCATTT GGA--ACCTG CCCCAAATAT
CCATTTCAAA ATGTAAACAG GATCACATAT GGG--GCCTG TCCCAGATAT
CTGTAATGAC TGACGGACCA AGTAATGGGC AGGCCTCATA T-AAGATCTT
CTATAATGAC TGATGGCCCG AGTGATGGGC TGGCCTCGTA C-AAAATTTT
TAGTAATGAC TGATGGAA-G TGCTTCAGGA AGAGCCGATA CTAGAATACT
TAGTAATGAC TGATGGGA-G TGCTTCAGGA AAAGCTGATA CTAAAATACT
TAGTAATGAC TGATGGAA-G TGCATCAGGA AGGGCTGATA CTAAAATACT
CTTCAACGAA TCAACGAG-- ----AAAGAA AATCGAGAAA ATAAGACCTC
CTTCAACGAA TCGACGAG-- ----AAAGAA AATTGAGAAA ATAAGACCTC
CTTTAATGAA TCAACCAG-- ----AAAGAA AATTGAGAAA ATAAGGCCTC
TTTCAATGAA TCAACAAG-- ----GAAGAA AATTGAGAAA ATAAGGCCTC
TTTCAATGAT TCAACAAG-- ----AAAGAA GATTGAAAAA ATCCGACCGC

TGTTAGCGAT CTAAGGCAGT ATGACAGTGA TGAACCAAAG CCTAGATCAC


TATTGGCGAT CTGAAACAGT ATCAAAGTGA TGAGCCAGAG CTCAGATCGA
CATAAGCGAT TTGAAGCAAT ATGATAGTGA CGAACCTGAA TTAAGGTCAC
TGTAGGTGAT TTGAAGCAAT ATGATAGTGA TGAACCAGAA TTGAGGTCGC
CATAAGCGAT TTGAAGCAAT ATGATAGTGA CGAACCTGAA TTAAGGTCAC
CTTTCTGGCA CGGTCTGCAC ----TCATCC TGAG--AGGA TC--CGTAGC
CTTTCTGGCA CGGTCTGCAC ----TCATCC TGAG--AGGA TC--CGTAGC
ATTTCTGGCA CGGTCTGCAC ----TCATCC TGAG--AGGA TC--AGTGGC
CTTTCTGGCA CGGTCTGCAC ----TCATAT TGAG--AGGG TC--AGTTGC
ATTTTTGGCA AGATCTGCAT ----TGATAT TGAG--AGGG TC--AGTTGC
TTTTCTAGCA CGGTCTGCAC ----TCATAT TGAG--AGGG TC--GGTTGC
TGTAGCAATG GTGTTCTCAC AGGAGGATTG CATGATAAAG GC--AGTCCG
TGTGGCAATG GTGTTCTCAC AAGAAGATTG CATGGTAAAG GC--AGTCCG
TGTAGCCATG GTGTTTTCAC AAGAAGATTG CATGATAAAA GC--AGTTAG
CGTGGCCATG GTGTTTTCAC AAGAGGATTG CATGATAAAA GC--AGTTAG
TGTGGCCATG GTATTTTCAC AAGAGGATTG TATGATAAAA GC--AGTTAG
-GAGAATGGG AGACCTCCAC ----TCACTC CAAAACAGAA AC--GGAAAA
-GAGAATGGG GGACCTCCAC ----TTACTC CAAAACAGAA AC--GGAAAA
-GAGAATGGG AGACCTCCAC ----TCACTC CAAAACAGAA AC--GAGAAA
-GAGAATGGG AGATCTCCAC ----TCCCTC CAAAACAGAA AC--GGAAAG
-GAGAATGGG GGACCTTCAC ----TCCCTC CAAAACAGAA AC--GCTACA
TATGGAGCAG ATGGCTGGAT CGAGTGAGCA GGCAGCGGAA GCCATGGAGG
CATGGAGCAG ATGGCTGGAT CAAGTGAGCA GGCAGCAGAA GCCATGGAAG
TATGGAGCAA ATGGCTGGAT CGAGTGAGCA AGCAGCAGAG GCCATGGAGG
TATGGAGCAA ATGGCTGGAT CAAGTGAGCA GGCAGCGGAG GCCATGGAAA
TATGGAGCAA ATGGCTGGAT CGAGTGAGCA AGCAGCAGAG GCCATGGAGG
GTGAAATCAA ACAGATTAGT CCTTGCGACT GGACTCAGAA ATACCCCTCA
GTAAAATCGG AGAAATTGGT CTTAGCAACA GGACTAAGGA ATGTTCCCCA
GTCAGGAGTG CCAAATTGAG GATGGTTACA GGACTAAGGA ACATTCCGTC
GTAAGAGTTA ATAGTCTCAA ACTGGCAGTC GGTCTGAGGA ACGTGCCTGC
GTTAAGCAAA ACACTCTGAA ATTGGCAACA GGGATGCGAA ATGTACCAGA
CAAAATGGAA AAAGGGAAAG TAGTTAAATC AGTCGAATTG AA--TGCCCC
CAAGATCGAA AAGGGGAAGG TTACTAAATC AATAGAGTTG AA--TGCACC
ATTCATTGAA GAGGGGAAAA TTGTCCATAT TAGCCCATTG TC--AGGAAG
ATTCATTGAG GAGGGGAAAA TCATTCATAC TAGCACATTG TC--AGGAAG
ATTCATTAGA GAAGGGAAAA TTGTCCACAT TGGTCCACTG TC--AGGAAG
TACTAATAGA TGGCACAGCC TCATTGAGTC CTGGAATGAT GA--TGGGCA
TACTAATAGA GGGCACAGCC TCATTGAGTC CAGGGATGAT GA--TGGGCA
TCCTAATAGA TGGCACAGTC TCATTGAGTC CTGGAATGAT GA--TGGGCA
TTCTAATAGA TGGCACAGCA TCATTGAGCC CTGGGATGAT GA--TGGGCA
TCTTAATAGA GGGGACTGCA TCATTGAGCC CTGGAATGAT GA--TGGGCA

TAGCAAGCTG GATCCA---- ---G-AGTGA ATTCAACAAG GCATGCGAAT


TAGCAAGCTG GATCCA---- ---G-AGTGA GTTCAACAAG GCATGTGAAT
TTTCAAGCTG GATCCA---- ---G-AATGA GTTCAACAAG GCATGCGAGC
TTGCAAGTTG GATTCA---- ---G-AATGA GTTCAACAAG GCATGCGAAC
TTTCAAGCTG GATACA---- ---G-AATGA GTTCAACAAG GCCTGCGAGC
CCATAAGTCC TGCTTG---- ---CCTGCTT GTGTGTA--- -----CGGGC
CCATAAGTCC TGCTTG---- ---CCTGCTT GTGTGTA--- -----CGGGC
CCACAAGTCC TGCTTG---- ---CCTGCTT GTGTGTA--- -----CGGGC
TCACAAATCT TGTCTG---- ---CCCGCCT GTGTGTA--- -----TGGAC
TCACAAATCT TGCCTA---- ---CCTGCCT GTGCGTA--- -----TGGAC
TCACAAGTCC TGCCTG---- ---CCTGCCT GTGTGTA--- -----TGGAC
AGGCGATCTG AATTTC---- ---GTGAACA GAGCAAACCA AAGATTGAAC
AGGTGATTTG AATTTC---- ---GTAAACA GAGCAAATCA ACGACTGAAT
AGGTGACCTG AATTTC---- ---GTTAATA GGGCAAATCA GCGATTGAAT
AGGTGACCTG AATTTC---- ---GTCAACA GAGCAAATCA ACGGTTGAAC
AGGTGATCTG AATTTC---- ---GTCAATA GGGCGAATCA GCGACTGAAT
TGGCGAGAAC AATTAG---- ---GTCAAAA GTTCGAAGAG ATAAGATGGC
TGGCGAGAAC AGCTAG---- ---GTCAAAA GTTTGAAGAG ATAAGATGGC
TGGCGGGAAC AATTAG---- ---GTCAGAA GTTTGAAGAA ATAAGATGGT
TGGAGAGAAC AATTGA---- ---GCCAGAA GTTTGAAGAG ATAAGATGGT
TGGCGAAACG AGTTGA---- ---GTCAGAA GTTTGAAGAG ATCAGATGGC
TTGCTAGTCA GGCTAG---- ---GCAGATG GTGCAGGCAA ---TGAGGAC
TCGCAAGTCA GGCTAG---- ---GCAAATG GTGCAGGCTA ---TGAGGAC
TTGCTAGTCA GGCCAG---- ---GCAAATG GTGCAGGCAA ---TGAGAGC
TTGCTAGTCA GGCCAG---- ---GCAAATG GTGCAGGCAA ---TGAGAGC
TTGCTAGTCA GGCTAG---- ---GCAAATG GTGCAAGCGA ---TGAGAAC
GAGAGAGAGA AGAAGAAAAA AGAGAGGACT ATTTGGAGCT ATAGCAGGTT

45
GATTGAATCA AG-------- ----AGGATT GTTTGGGGCA ATAGCTGGTT
CATTCAATCC AG-------- ----AGGTCT ATTTGGAGCC ATTGCCGGTT
TAGATCAAGT AG-------- ----AGGACT ATTTGGAGCC ATAGCTGGAT
GAAACAAACT AG-------- ----AGGCAT ATTTGGCGCA ATCGCGGGTT
TAATTATCAC TATGAGGAGT GCTCCTGTTA TCCTGATGCT G---GCGAAA
TAATTCTCAC TATGAGGAAT GTTCCTGTTA CCCTGATACC G---GCAAAG
TGCTCAGCAT GTAGAGGAGT GTTCCTGTTA TCCTCGATAT C---CTGACG
TGCTCAGCAT GTCGAGGAGT GCTCCTGCTA TCCTCGATAT C---CTGGTG
TGCTCAGCAT GTGGAGGAAT GCTCCTGTTA CCCCCGGTAT C---CAGAAG
TGTTCAATAT GCTGAGTACA GTCTTAGGAG TTTCAATCCT GAATCTTGGG
TGTTTAATAT GCTAAGTACG GTCTTAGGAG TCTCAATCTT AAATCTTGGG
TGTTCAACAT GCTAAGTACA GTCTTAGGAG TCTCAATCCT GAATCTCGGG
TGTTCAACAT GCTAAGTACG GTTTTAGGAG TCTCGGTACT GAATCTTGGG
TGTTCAATAT GTTAAGCACT GTATTAGGCG TCTCCATCCT GAATCTTGGA

TGACAGATT- CAAGTTGGAT TGAACTTGAT GAAATAGGGG AAGACGTTGC


TGACCGATT- CGAGCTGGAT AGAACTCGAT GAGATAGGGG AAGATGTTGC
TGACCGATT- CAATCTGGAT AGAGCTCGAT GAGATTGGAG AAGACGTGGC
TGACAGATT- CAAGCTGGAT AGAGCTTGAT GAGATTGGAG AAGATGTGGC
TAACTGATT- CAATCTGGAT AGAGCTCGAT GAAATTGGAG AGGACGTAGC
TCGCTGTG-- GCCAGTGGAT ATGATTTTGA GAGGGAAGGG TACTCTCTGG
TCGCTGTG-- GCCAGTGGAT ATGATTTTGA GAGGGAAGGG TACTCTCTGG
TTGCCGTG-- GCCAGTGGAT ATGACTTTGA GAGAGAAGGG TACTCTCTGG
CTGCCATA-- GCCAGTGGGT ACAACTTCGA AAAAGAGGGA TACTCTCTAG
CTGCAGTA-- TCCAGTGGGT ACGACTTCGA AAAAGAGGGA TATTCCTTGG
CTGCCGTA-- GCCAGTGGGT ACGACTTTGA AAGAGAGGGA TACTCTCTAG
CCCATGCATC AACTCCTGAG GCACTTCCAA AAAGATGCAA AAGTGCTGTT
CCCATGCACC AACTCCTGAG ACACTTTCAA AAGGATGCAA AGGTGCTGTT
CCCATGCATC AACTTTTAAG ACATTTTCAG AAAGATGCAA AAGTGCTCTT
CCCATGCATC AGCTTTTAAG GCATTTTCAG AAAGATGCGA AAGTGCTTTT
CCTATGCATC AACTTTTAAG ACATTTTCAG AAGGATGCGA AAGTGCTTTT
TGATTGAA-- GAAGTGAGAC ACAGATTGAA GATAACAGAG AATAGTTTTG
TGATTGAA-- GAAGTGAGAC ACAGACTAAA AACAACTGAA AATAGCTTTG
TGATTGAA-- GAAGTGAGAC ACAAACTGAA GGTAACAGAG AATAGTTTTG
TAATTGAA-- GAAATGCGAC ATAGGTTAAG AATTACAGAG AATAGCTTTG
TCATTGCT-- GAATGTAGAA ATATACTGAC AAAGACTGAA AATAGCTTTG
AATTGGGACT CATCCTAGCT CCAGTGCCGG TCTGAAAGAT AATCTTCTTG
AATTGGGACT CACCCTAGTT CCAGTGCAGG TCTAAAAGAT GATCTTATTG
CATTGGGACT CCTCCTAGCT CCAGTGCTGG TCTAAAAGAT GATCTTCTTG
CGTTGGGACT CATCCTAGCT CCAGTACTGG TCTAAGAGAT GATCTTCTTG
CATTGGGACT CATCCTAGCT CCAGTGCTGG TCTGAAAAAT GATCTTCTTG
TTATAGAGGG AGGATGGCAG GGAATGGTAG ATGGTTGGTA TGGGTACCAC
TTATAGAAGG AGGATGGCAA GGAATGGTTG ATGGTTGGTA TGGATACCAT
TTATTGAAGG GGGATGGACT GGAATGATAG ATGGATGGTA CGGTTATCAT
TCATAGAAGG AGGTTGGCCA GGACTAGTCG CTGGCTGGTA TGGTTTCCAG
TCATAGAAAA TGGTTGGGAG GGAATGGTAG ACGGTTGGTA CGGTTTCAGG
TCACATGTGT GTGCAGGGAT AAT----TGG CATGGCTCAA ATCGGCCATG
TGATGTGTGT GTGCAGAGAC AAT----TGG CATGGTTCGA ACCGGCCATG
TCAGATGTAT CTGCAGAGAC AAC----TGG AAAGGCTCTA ATAGGCCCGT
TCAGATGTGT CTGCAGAGAC AAC----TGG AAAGGCTCCA ATAGGCCCAT
TTAGATGTGT TTGCAGAGAC AAT----TGG AAGGGCTCCA ATAGACCCGT
CAGAAGAGGT ACACCAAAAC CACATACTGG TGGGACGGAC TCCAATCCTC
CAGAAGAGGT ACACCAAAAC CACATACTGG TGGGATGGGC TCCAATCCTC
CAAAAGAAAT ACACCAAAAC NACATACTGG TGGGACGGAC TCCAATCCTC
CAAAAGAAAT ACACCAAGAC AACATACTGG TGGGATGGGC TCCAATCCTC
CAAAAGAGAT ACACCAAGAC TACTTACTGG TGGGATGGTC TTCAATCCTC

TCCAATTGAG CACATTGCAA --GTATGAGA AGGAACTATT TCACAGCGGA


CCCAATTGAG CACATTGCAA --GCATGAGA AGGAACTACT TCACAGCGGA
TCCAATTGAA CACATTGCAA --GCATGAGA AGGAATTACT TCACAGCAGA
TCCAATTGAA CACATTGCAA --GCATGAGA AGGAATTATT TCACATCAGA
CCCAATTGAG TACATTGCAA --GCATGAGG AGGAATTATT TCACAGCAGA
TTGGGATAGA TCCTTTCCGT CTGCTTCAGA ACAGTCAGGT CTTCAG-TCT
TTGGGATAGA TCCTTTCCGT CTGCTTCAGA ACAGTCAGGT CTTCAG-TCT
TCGGGATTGA TCCTTTCCGT CTGCTGCAAA ACAGCCAGGT CTTTAG-TCT
TGGGAATAGA CCCTTTCAAA CTGCTTCAAA ACAGCCAAGT ATACAG-CCT
TGGGAATAGA CCCTTTCAAA CTACTTCAAA ATAGCCAAAT ATACAG-CCT
TCGGAATAGA CCCTTTCAGA CTGCTTCAAA ACAGCCAAGT GTACAG-CCT
TCAGAACTGG GGAATTGAAC --CTATTGAC AATGTCATGG GGATGATCGG
TCAAAACTGG GGAATTGAAC --CCATCGAC AATGTCATGG GTATGATTGG
TCAAAATTGG GGAATTGAAC --ATATCGAC AATGTAATGG GAATGATTGG
TCAAAATTGG GGAATTGAAC --ACATCGAC AGTGTGATGG GAATGGTTGG
TCAAAATTGG GGAGTTGAAC --CTATCGAC AATGTGATGG GAATGATTGG
AGCAAATAAC ATTTATGCAA --GCCTTACA GCTACTATTT GAAG---TGG
AACAAATAAC ATTCATGCAA --GCATTACA ACTGCTGTTT GAAG---TGG
AGCAAATAAC ATTTATGCAA --GCCTTACA TCTATTGCTT GAAG---TGG
AGCAAATAAC CTTTATGCAA --GCCTTACA ACTATTGCTT GAAG---TGG
AACAGATAAC ATTTTTGCAA --GCATTGCA ACTCTTACTT GAAG---TTG
AAAATTTGCA GGCCTACCAA --------AA ACGAATGGGA GTGCAAATGC
AAAATTTGCA GGCTTACCAG --------AA ACGGATGGGA GTGCAAATGC
AAAATTTGCA GGCCTATCAG --------AA ACGAATGGGG GTGCAGATGC
AAAATTTGCA GACCTATCAG --------AA ACGAATGGGG GTGCAGATGC
AAAATTTGCA GGCCTATCAG --------AA ACGAATGGGG GTGCAGATGC
CATAGCAATG AGCAGGGGAG TGGATACGCT GCAGACAAAG AATCCACTCA
CACAGCAATG ACCAGGGATC AGGGTATGCA GCAGACAAAG AATCCACTCA

46
CATCAGAATG AACAGGGATC AGGCTATGCA GCGGATCAAA AAAGCACACA
CATTCAAATG ATCAAGGGGT TGGTATGGCT GCAGATAGGG ATTCAACTCA
CATCAAAATT CTGAGGGAAC AGGACAAGCA GCAGATCTCA AAAGCACTCA
GGTATCTTTC AA--TCAAAA --TTTGGAGT ATCAAATA-G GATATATATG
GGTGTCTTTC GA--TCAAAA --CCTGGATT ATCAAATA-G GATACATCTG
CATAGACATA AATATGGAAG --ATTATAGC ATTGATTCCA GTTATGTGTG
CGTAGATATA AACATAAAGG --ATTATAGC ATTGTTTCCA GTTATGTGTG
GCTATATATA AATGTGGCAG --ATTATAGT GTTGATTCTA GTTATGTGTG
TGATGATTTC GCTCTCATAG --TGAATGCA CCAAATCATG AGGGAATAGA
TGATGATTTC GCTCTCATAG --TGAATGCA CCAAATCATG AGGGAATACA
TGATGACTTC GCTCTCATAG --TGAATGCA CCAAATCATG AGGGAATACA
CGACGATTTT GCCCTCATAG --TGAATGCA CCAAATCATG AGGGAATACA
TGACGATTTT GCTCTGATTG --TGAATGCA CCCAATCATG AAGGGATTCA

AGTATCCCAT TGCAGGGC-- ----TACTGA ATACATAATG AAGGGAGTGT


AGTGTCTCAT TGCAGGGC-- ----CACTGA GTACATAATG AAGGGGGTTT
GGTGTCCCAT TGCAGAGC-- ----CACAGA ATATATAATG AAGGGGGTAT
GGTGTCTCAC TGCAGAGC-- ----CACAGA ATACATAATG AAGGGGGTGT
GGTGTCCCAT TGTAGAGC-- ----CACTGA GTACATAATG AAGGGGGTAT
TATTAGACCA AATGAGAATC C-AGCACATA AAAGTCAATT GGTATGGATG
TATTAGACCA AATGAGAATC C-AGCACATA AAAGTCAATT GGTATGGATG
AATTAGACCA AATGAGAATC C-AGCACATA AAAGTCAATT GGTGTGGATG
AATCAGACCG AACGAGAATC C-AGCACACA AGAGTCAGCT GGTGTGGATG
AATCAGACCT AACGAGAATC C-AGCACACA AGAGTCAGCT GGTGTGGATG
AATCAGACCA AATGAGAATC C-AGCACACA AGAGTCAACT GGTGTGGATG
AATATTACCT GACATGACTC CAAGCGCAGA GATGTCACTG AGAGGAGTGA
AATATTGCCT GACATGACCC CCAGCACGGA AATGTCACTA AGAGGAGTGA
AGTATTACCA GACATGACTC CAAGCACAGA GATGTCAATG AGAGGGATAA
AGTATTACCA GATATGACTC CAAGCACAGA GATGTCAATG AGAGGAATAA
GATATTGCCC GACATGACTC CAAGCATCGA GATGTCAATG AGAGGAGTGA
AACAAGAGAT AAGAACTTTC TCGTTTCAGC TTATTTAA-- ----------
AACAGGAGAT AAGAACTTTC TCATTTCAGC TTATTTAATG ATAAAAAACA
AGCAAGAGAT AAGAACTTTC TCATTTCAGC TTATTTAATA ATAAAAAACA
AGCAAGAGAT AAGAACTTTC TCGTTTCAGC TTATTTAATG ATAAAAAACA
AGAGTGAGAT AAGGACCTTC TCTTTTCAGC TTATTTAATA CTAAAAAACA
AGCGATTCAA GTGATCCTCT TGTTGTTGCC GCAAGTATCA TTGGGATACT
AGAGATTCAA GTGATCCTCT CGTTGTTGCA GCAAGTATCA TTGGGATATT
AACGATTCAA GTGACCCCCT TGTTGTTGCT GCGAGTATCA TTGGGATCTT
AACGATTCAA GTGACCCGCT TGTTGTTGCC GCGAGTATCA TTGGGATCTT
AACGGTTCAA GTGATCCTCT CGCTATTGCC GCAAATATCA TTGGGATCTT
AAAGGCAATA GATGGAGTCA CCAATAAGGT CAACTCGATC ATTGACAAAA
AAAGGCATTT AATGGAATCA CCAACAAGGT AAATTCTGTG ATTGAAAAGA
AAATGCCATT AACGGGATTA CAAACAAGGT GAACTCTGTT ATCGAGAAAA
AAAGGCAATT GATAAAATAA CATCCAAGGT GAATAATATA GTCGACAAGA
AGCAGCAATC AACCAAATCA ATGGGAAGCT GAATAGGTTG ATCGGGAAAA
CAGTGGAGTT TTCGG----- -AGACAATCC ACGCCCCAAT GATGGAACAG
CAGTGGGGTT TTCGG----- -TGACAACCC GCGTCCCAAA GATGGAACAG
CTCAGGGCTT GTTGG----- -CGACACACC CAGAAACGAC GACAGATCTA
CTCAGGGCTT GTTGG----- -AGACACACC CAGAAAAAAC GACAGCTCCA
CTCAGGACTT GTTGG----- -CGACACACC AAGAAATGAC GATAGCTCCA
AGCAGGGGTG GATAGGTT-- -CTATAGGAC TTGCAAACTA GTTGGAATCA
AGCAGGAGTG GATAGATT-- -CTATAGGAC TTGCAAGCTA GTTGGAATCA
AGCAGGGGTG AATAGATT-- -CTACAGAAC CTGCAAGCTA GTCGGAATCA
AGCAGGAGTG GATAGATT-- -CTACAGGAC CTGCAAGTTA GTGGGAATCA
AGCCGGAGTC GACAGGTT-- -TTATCGAAC CTGTAAGCTA CATGGAATCA

ACATAAACAC AGCTTTGTTG AATGCATCCT GTGCAGCCAT GGATGACTTC


ACATAAATAC AGCTTTGCTC AATGCATCTT GTGCAGCCAT GGATGACTTC
ACATTAATAC TGCCTTGCTT AATGCATCCT GTGCAGCAAT GGACGATTTC
ACATCAATAC TGCCTTACTT AATGCATCTT GTGCAGCAAT GGATGATTTC
ACATTAATAC TGCCCTGCTC AATGCATCCT GTGCAGCAAT GGACGATTTT
GCAT--GCCA TTCTGCAGCA TTTGAGGACC TGAGAGTCTC AAGTTTCATT
GCAT--GCCA TTCTGCAGCA TTTGAGGACC TGAGAGTCTC AAGTTTCATT
GCAT--GCCA TTCTGCAGCA TTTGAAGATC TGAGAGTCTC AAGCTTCATC
GCAT--GCAA TTCTGCTGCA TTTGAAGATC TAAGAGTATT AAGCTTCATC
GCAT--GCCA TTCTGCTGCA TTTGAAGATT TAAGATTGTT AAGCTTCATC
GCAT--GCCA TTCTGCCGCA TTTGAAGATC TAAGAGTATT GAGCTTCATC
GAGTTAGTAA GATGGGAGTA GATGAATATT CCAGCACGGA GAGAGTGGTG
GAGTTAGCAA AATGGGGGTG GATGAATATT CTAGCACTGA AAGGGTGGTC
GAGTCAGCAA AATGGGCGTG GATGAATACT CCAGCACAGA GAGGGTAGTG
GAGTCAGCAA AATGGGTGTG GATGAATACT CCAGTACAGA GAGGGTGGTG
GAATCAGCAA AATGGGTGTA GATGAGTACT CCAGCACGGA GAGGGTAGTG
---------- ---------- ---------- ---------- ----------
CCCTTGTTTC TACT------ ---------- ---------- ----------
CCCTTGTTTC TACT------ ---------- ---------- ----------
CCCTTGTTTC TACT------ ---------- ---------- ----------
C--------- ---------- ---------- ---------- ----------
GCACTTGATA TTGTGGATTC TTGATCGTCT TTTCTTCAAA TGCATTTATC
GCACTTGATA TTGTGGATTC TTGATCGTCT TTTCTTCAAA TGCATTTATC
GCACTTTATA TTGTGGATTC TTGATCGTCT TTTTTTCAAA TGCATTTATC
GCACTTGATA TTGTGGATTC TTGATCGTCT TTTTTTCAAA TGCGTCTATC
GCACTTGATA TTGTGGATTC TTGATCGTCT TTTTTTCAAA TGCATTTACC
TGAACACTCA GTTTGAGGCC GTTGGAAGGG AATTTAATAA CTTGGAAAGG
TGAACACCCA ATTTGAAGCT GTTGGGAAAG AATTCAGTAA CTTAGAGAAA
TGAACATTCA ATTCACAGCT GTGGGTAAAG AATTCAACAA ATTAGAAAAA

47
TGAACAAGCA ATATGAAATA ATTGATCATG AATTCAGTGA GGTTGAAACT
CAAACGAGAA ATTCCATCAG ATTGAAAAAG AATTCTCAGA AGTAGAAGGG
GCAGTTGTGG TCCGGTGTCC CCTAAC---- -----GGGGC ATATGGAGTA
GCAGCTGTGG TCCAGTGTAT GTTGAT---- -----GGAGC AAACGGAGTA
GCAATAGTAA TTGCAGGAAT CCTAACAATG AGAGAGGGAA TCCAGGAGTG
GCAGTAGCCA TTGCTTGGAT CCTAACAATG AAGAAGGTGG TCATGGAGTG
GCAGCAGTAA CTGCAGGGAT CCTAATAACG AGAGAGGGGG CCCAGGAGTG
ATATGACCAA GAAGAAGTCT TACATAAATC GGACAGGAAC ATGTGAATTC
ACATGAGCAA AAAGAAGTCT TACATAAATC GGACAGGAAC ATTTGAGTTC
ATATGAGCAA AAAGAAGTCC TACATAAATA GGACAGGGAC ATTTGAATTC
ACATGAGCAA AAAGAAGTCC TATATAAATA AAACAGGGAC ATTTGAATTC
ATATGAGCAA GAAAAAGTCT TACATAAACA GAACAGGTAC ATTTGAATTC

CAACTGATCC CAATGATAAG CAAATGCAGA ACCAAAGAAG GAAGACGGAA


CAACTGATTC CAATGATAAG CAAATGCAGA ACAAAAGAAG GAAGAAGGAA
CAACTAATTC CCATGATAAG CAAGTGTAGA ACTAAAGAGG GAAGGCGAAA
CAATTAATTC CAATGATAAG CAAGTGTAGA ACTAAGGAGG GAAGGCGAAA
CAACTAATTC CCATGATAAG CAAGTGCAGA ACTAAAGAGG GAAGGCGAAA
AGAGGAACAA GAGTGATCCC AAGAGGACAA CTATCCACTA GAGGAGTTCA
AGAGGAACAA GAGTGATCCC AAGAGGACAA CTATCCACTA GAGGAGTTCA
AGAGGGACAA GAGTGGCCCC AAGGGGACAA CTATCTACTA GAGGAGTTCA
AGAGGGACCA AAGTATCCCC AAGGGGGAAA CTTTCCACTA GAGGAGTACA
AGAGGGACAA AAGTATCTCC GCGGGGGAAA CTGTCAACTA GAGGAGTACA
AAAGGGACGA AGGTGGTCCC AAGAGGGAAG CTTTCCACTA GAGGAGTTCA
GTGAGTATTG ACCGTTTCTT GAGGGTCCGA GATCAGCAGG GGAACGTACT
GTGAGCATTG ACCGTTTCTT AAGGGTCCGA GATCAGCGAG GAAATGTACT
GTAAGCATTG ACCGGTTTTT GAGAGTTCGA GACCAACGAG GAAATGTACT
GTTAGCATTG ATCGGTTTTT GAGAGTTCGA GACCAACGCG GGAATGTATT
GTGAGCATTG ACCGGTTCTT GAGAGTCCGG GACCAACGAG GAAATGTACT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GTCGCCTTAA ATACGGTTTG AAAAGAGGGC CTTCTACGGA AGGGGTACCT
GTCGCTTTAA ATACGGTTTG AAAAGAGGGC CTTCTACGGA AGGAGTGCCT
GCTTCTTTAA ACACGGTCTG AAAAGAGGGC CTTCTACGGA AGGAGTACCT
GACTCTTCAA ACACGGCCTT AAAAGAGGCC CTTCTACGGA AGGAGTACCT
GTCGCTTTAA ATACGGACTG AAAGGAGGGC CTTCTACGGA AGGAGTGCCA
AGGATAGAGA ATTTAAACAA GCAGATGGAA GACGGATTCC TAGATGTCTG
AGACTGGAGA ACTTGAACAA AAAGATGGAA GACGGGTTTC TAGATGTGTG
AGGATGGAAA ATTTAAATAA AAAAGTTGAT GATGGATTTC TGGACATTTG
AGACTCAATA TGATCAATAA TAAGATTGAT GACCAAATAC AAGACGTATG
AGAATTCAGG ACCTCGAGAA ATATGTTGAG GACACTAAAA TAGATCTCTG
AAAGGGTTTT CATTTAAATA CGGCAATG-- ----GTGTTT GGATCGGGAG
AAGGGATTTT CATATAGGTA TGGTAATG-- ----GTGTTT GGATAGGAAG
AAAGGCTGGG CCTTTGACAA TGGAGATG-- ----ACGTGT GGATGGGAAG
AAAGGCTGGG CCTTTGATGA TGGAAATG-- ----ACGTGT GGATGGGAAG
AAAGGGTGGG CCTTTGACAA TGGAAATG-- ----ATGTTT GGATGGGACG
ACAAGCTTCT TCTACCGCTA TGGGTTCGTA GCCAACTTCA GTATGGAGCT
ACAAGCTTTT TCTACCGCTA TGGGTTTGTA GCCAACTTCA GCATGGAGCT
ACAAGCTTTT TCTATCGCTA TGGATTTGTA GCCAATTTTA GCATGGAGCT
ACAAGCTTTT TTTATCGATA TGGATTTGTG GCTAATTTTA GCATGGAGCT
ACAAGTTTTT TCTATCGTTA TGGGTTTGTT GCCAATTTCA GCATGGAGCT

AACTAACCTG TATGGATTCC TTATAAAAGG AAGATCC--C ATTTGAGAAA


GACAAACCTG TATGGGTTCA TTATAAAAGG AAGGTCC--C ATTTGAGAAA
GACCAATTTA TATGGTTTCA TCATAAAAGG AAGATCT--C ACTTAAGGAA
GACCAACTTG TATGGTTTCA TCATAAAAGG AAGATCC--C ACTTAAGGAA
AACCAATTTA TATGGATTCA TCATAAAGGG AAGATCT--C ATTTAAGGAA
GATTGCTTCA AATGAGAACG TGGAAGCAAT GGATTCCAGC ACTCTTGAAC
GATTGCTTCA AATGAGAACG TGGAAGCAAT GGATTCCAGC ACTCTTGAAC
AATTGCTTCA AATGAGAACA TGGAAACAAT GGACTCCAGC ACTCTTGAAC
AATTGCTTCA AATGAAAACA TGGATACTAT GGAATCAAGT ACTCTTGAAC
AATTGCTTCA AATGAGAACA TGGATAATAT GGGATCGAGC ACTCTTGAAC
AATTGCTTCC AATGAAAATA TGGAGACTAT GGAATCAAGT ACACTTGAAC
CTTATCTCCT GAAGAGGTTA GTGAAACACA GGGAACAGAG AAGTTGACAA
CCTATCCCCT GAAGAAGTTA GTGAAACACA GGGAATGGAA AAGTTGACGA
ACTATCTCCT GAGGAGGTCA GTGAAACACA GGGGACAGAG AAACTGACAA
ATTGTCTCCT GAGGAGGTCA GTGAAACACA GGGAACTGAA AGATTGACAA
ACTGTCTCCC GAGGAGGTCA GTGAAACACA GGGAACAGAG AAACTGACAA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GAGTCTATGA GGGAAGAGTA TCGGCAGGAA CAGCAGAGTG CTGTGGATGT
GAGTCTATGA GGGAAGAGTA TCGGCAGGAA CAGCAGAATG CTGTGGATGT
GAGTCTATGA GGGAAGAATA TCGAAAGGAA CAGCAGAGTG CTGTGGATGC
GAGTCTATGA GGGAAGAATA TCGAAAGGAA CAGCAGAATG CTGTGGATGC
AAGTCTATGA GGGAAGAATA TCGAAAGGAA CAGCAGAGTG CTGTGGATGC
GACTTATAAT GCTGAACTTC TGGTTCTCAT GGAAAAT--G AGAGAACTCT
GACATACAAT GCAGAGCTTC TAGTTCTGAT GGAAAAT--G AGAGGACACT
GACATATAAT GCAGAATTGT TAGTTCTACT GGAAAAT--G AAAGGACTCT
GGCATATAAT GCAGAATTGC TAGTACTACT TGAAAAT--C AAAAAACACT

48
GTCATACAAC GCGGAGCTTC TTGTGGCCCT GGAGAAC--C AACATACAAT
AACCAAAAGC ACTAATTCCA GGAGCGGCTT TGAAATGATT TGGGATCCAA
GACCAAAAGT CACAGTTCCA GACATGGGTT TGAGATGATT TGGGATCCTA
AACGATCAGC AAGGATTTAC GCTCAGGTTA TGAAACTTTC AAAGTCATTG
AACGATCAGC GAGAAGTTAC GCTCAGGATA TGAAACCTTC AAAGTCATTG
AACAATCAAG AAAGATTCGC GCTCTGGTTA TGAGACTTTC AGGGTCGTTG
GCCCAGCTTT GGAGTGTCTG GGATTAATGA ATCGGCTGAC ATGAGCATTG
GCCCAGCTTT GGAGTTTCCG GAATTAATGA ATCGGCTGAC ATGAGCATTG
GCCCAGCTTT GGAGTGTCTG GAATTAATGA ATCGGCTGAT ATGAGCATTG
TCCCAGTTTT GGAGTGTCTG GAATAAACGA GTCAGCTGAT ATGAGTATTG
TCCCAGTTTT GGTGTGTCTG GGAGCAACGA GTCAGCGGAC ATGAGTATTG

TGACACCGAT GTGGTAAACT TTGTGAGTAT GGAATTCTCT CTTACTGATC


TGATACTGAC GTGGTGAACT TTGTGAGTAT GGAATTCTCC CTTACTGACC
TGACACCGAC GTGGTAAACT TTGTGAGCAT GGAGTTTTCT CTCACTGACC
TGACACCGAC GTGGTAAACT TTGTGAGCAT GGAGTTTTCT CTCACTGACC
TGACACAGAT GTGGTAAACT TTGTGAGCAT GGAGTTTTCT CTCACTGACC
TGAGAAGCAG ATATTGGGCT ATAAGGACCA GGAGTGGAGG AAACACCAAT
TGAGAAGCAG ATATTGGGCT ATAAGGACCA GGAGTGGAGG AAACACCAAT
TGAGAAGCAG ATATTGGGCT ATAAGGACCA GGAGTGGAGG AAACACCAAC
TAAGAAGCAG GTACTGGGCC ATAAGGACCA GAAGTGGAGG AAACACTAAT
TGAGAAGCGG GTACTGGGCC ATAAGGACCA GGAGTGGAGG AAACACTAAT
TGAGAAGCAG GTACTGGGCC ATAAGGACCA GAAGTGGAGG AAACACCAAT
TAACATATTC ATCCTCAATG ATGTGGGAAA TCAACGGTCC TGAGTCAGTG
TAACTTATTC ATCGTCTATG ATGTGGGAGA TTAACGGGCC AGAATCAGTG
TAACTTACTC ATCGTCAATG ATGTGGGAGA TTAATGGCCC TGAGTCAGTG
TAACATATTC ATCGTCGATG ATGTGGGAGA TTAACGGTCC TGAGTCGGTT
TAACTTACTC ATCGTCAATG ATGTGGGAGA TTAATGGTCC TGAATCAGTG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TGACGATGGT CATTTTGTCA ACATAGAGCT GGAGTAAAAA ACTACCTTGT
TGACGATGGT CATTTTGTCA ACATAGAGCT GGAGTAAAAG ATCTTCCT--
TGACGATAGT CATTTTGTCA GCATAGAGCT GGAGTAAAAA ACTACCTTGT
TGACGACAGT CATTTTGTCA GCATAGAGTT GGAGTAAAAA ACTACCTTGT
TGACGATGGT CATTTTGTCA GCATAGAGCT GGAGTAAAAA ACTACCTTGT
AGACTTTCAT GACTCAAATG TCAAGAACCT TTA-TGACAA GGTCCGACTA
TGACTTTCAT GATTCTAATG TCAAGAATCT GTA-TGATAA AGTCAGAATG
GGATTTCCAT GACTCAAATG TGAAGAATCT GTA-TGAGAA AGTAAAAAGC
CGATGAGCAT GATGCGAACG TGAACAATCT ATA-TAACAA GGTGAAGAGG
TGATCTAACT GACTCAGAAA TGAACAAACT GTT-TGAAAG AACAAAGAAG
ATGGGTGGAC TGGAACGGAC AGTA-GCTTC TCGGTGA--- --AACAAGAT
ATGGATGGAC AGAGACTGAT AGTA-AGTTC TCTGTGA--- --GGCAAGAT
GTGGTTGGTC CACACCTAAT TCCA-AATCG CAGATCAA-- TAGACAGGTC
AAGGCTGGTC CAAACCTAAT TCCA-AATTG CAGATAAA-- TAGGCAAGTC
GTGGTTGGAC TACGGCTAAT TCCA-AGTCA CAAATAAA-- TAGGCAAGTC
GTGTTACAGT GATAAAGAAC AATATGATGG ACAACGACCT TGGACCAGCA
GAGTTACAGT GATAAAGAAT AATATGATAA ACAACGACCT TGGACCAGCA
GGGTAACAGT GATAAAGAAC AATATGATAA ATAATGACCT TGGGCCAGCA
GAGTAACAGT GATAAAGAAC AACATGATAA ACAATGACCT TGGGCCAGCA
GAGTTACTGT CATCAAAAAC AATATGATAA ACAATGATCT TGGTCCAGCA

CGAGGCTGGA GCCACACAGA TGGGAAAAGT ACTGCGTTCT TCGGATAGGA


CAAGGCTGGA GCCACACAAA TGGGAAAAGT ACTGTGTTCT TGAAGTAGGG
CGAGACTTGA GCCACACAAA TGGGAGAAGT ACTGTGTCCT TGAGATAGGA
CAAGACTTGA ACCACACAAA TGGGAGAAGT ACTGTGTTCT TGAGATAGGA
CGAGACTTGA GCCACATAAA TGGGAGAAAT ACTGTGTCCT TGAGATAGGA
CAACAGAGAG CATCTGCAGG ACAAATCAGT GTACAGCCCA CTTTCTCAGT
CAACAGAGAG CATCTGCAGG ACAAATCAGT GTACAGCCCA CTTTCTCAGT
CAGCAGAGAG CATCTGCAGG ACAAATCAGT GTGCAGCCTA CTTTCTCGGT
CAACAGAGGG CCTCTGCAGG TCAAATCAGT GTACAACCTG CATTTTCTGT
CAACAGAGGG CCTCCGCAGG CCAAACCAGT GTGCAACCTA CGTTTTCTGT
CAACAGAGGG CATCTGCGGG CCAAATCAGC ATACAACCTA CGTTCTCAGT
CTTGTTAACA CTTATCAATG GATCATCAGG AATTGGGAGA CTGTAAAGAT
CTAGTTAACA CATATCAATG GATCATTAGG AATTGGGAGA CTGTAAAGAT
TTGGTCAATA CCTATCAGTG GATCATCAGA AACTGGGAAA CTGTTAAAAT
TTGGTCAATA CCTATCAATG GATCATCAGA AATTGGGAAG CTGTCAAAAT
TTGGTCAATA CCTATCAATG GATCATCAGA AACTGGGAAA CTGTTAAAAT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TTCTACT--- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TTCTACT--- ---------- ---------- ---------- ----------
TTCTACT--- ---------- ---------- ---------- ----------
TTCTACT--- ---------- ---------- ---------- ----------
CAGCTTAGGG ATAATGCAAA GGAGCTGGGT AATGGTTGTT TCGAGTTCTA
CAGCTGAGAG ACAACGTCAA AGAACTAGGA AATGGATGTT TTGAATTTTA
CAATTAAAGA ATAATGCCAA AGAAATCGGA AATGGATGTT TTGAGTTCTA
GCACTGGGCT CCAATGCTAT GGAAGATGGG AAAGGCTGTT TCGAGCTATA
CAACTGAGGG AAAATGCTGA GGATATGGGC AATGGTTGTT TCAAAATATA

49
ATCGTAGCAA TAACTG---- ---ATTGGTC AGGATATAGC GGGAGTTTTG
GTTGTGGCAA TGACTG---- ---ATTGGTC AGGGTATAGC GGGAGTTTCG
ATAGTTGACA GCAATA---- ---ATTGGTC AGGTTACTCT GGTATTTTCT
ATAGTTGACA GAGGTA---- ---ATAGGTC CGGTTATTCT GGTATTTTCT
ATAGTTGACA GTGATA---- ---ACTGGTC TGGGTATTCT GGTATATTCT
ACAGCTCAGA TGGCTCTTCA GCTATTCATT AAGGACTACA GATACCCATA
ACAGCCCAGA TGGCTCTTCA GCTGTTCATT AAAGACTACA GATACACCTA
ACAGCCCAAA TGGCTCTTCA ACTATTCATC AAAGACTACA GATACACGTA
ACAGCCCAGA TGGCTCTCCA ATTGTTCATC AAAGACTACA GATATACATA
ACAGCTCAAA TGGCCCTTCA GTTGTTCATC AAAGATTACA GGTACACGTA

GACATGCTCT TACGGACTGA AATAGGCCAA GTGTCAAGGC CCATGTTTCT


GAAATGCTCT TGCGGACTGC AATAGGCCAG GTGTCAAGGC CCATGTTCCT
GATATGCTAC TAAGAAGTGC CATAGGCCAG ATGTCAAGGC CTATGTTCTT
GATATGCTTC TAAGAAGTGC CATAGGCCAG GTTTCAAGGC CCATGTTCTT
GATATGTTAC TAAGAAGTGC CATAGGCCAA ATTTCAAGGC CTATGTTCTT
ACAGAGAAAT CTTCCCTT-C GAAAGACCGA CCATTATGGC TGCGTTTAAG
ACAGAGAAAT CTTCCCTT-C GAAAGACCGA CCATTATGGC TGCGTTTAAG
ACAGAGAAAT CTTCCCTT-C GAAAGAGCGA CCATTATGGC GGCATTCACA
GCAAAGAAAC CTCCCATT-T GACAAACCAA CCATCATGGC AGCATTCACT
ACAAAGAAAC CTCCCATT-T GAAAAGTCAA CCATCATGGC AGCATTCACT
ACAGAGAAAT CTCCCTTT-T GACAGAACAA CCGTTATGGC AGCATTCACT
TCAATGGTCT CAAGATCC-C ACAATGCTGT ACAATAAGAT GGAGTTTGA-
CCAATGGTCC CAAGAACC-C ACCATGCTAT ACAATAAGAT GGAGTTTGA-
TCAATGGTCT CAGAATCC-T ACAATGCTAT ACAATAAAAT GGAATTTGA-
TCAATGGTCT CAGAATCC-T GCAATGTTGT ACAACAAAAT GGAATTTGA-
TCAGTGGTCC CAGAACCC-T ACAATGCTAT ACAATAAAAT GGAATTTGA-
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TCACAAATGT GATAATGAAT GTATGGAAAG TGTAAAAAAC GGAACGTATG
TCACAAATGT GACAATGAAT GCATGGATAG TGTGAAAAAC GGGACATATG
CCACAAGTGT GACAATGAAT GCATGGAAAG TGTAAGAAAT GGGACTTATG
CCATAAATGT GATGATCAGT GCATGGAAAC AATTCGGAAC GGGACCTATA
CCACAAATGT GACAATGCCT GCATAGGGTC AATCAGAAAT GGAACTTATG
TCCAGCATCC AGAACTGACA GGATTAGATT GCATAAGACC TTGTTTCTGG
TTCAACATCC TGAGCTAACA GGGCTAGACT GTATAAGGCC GTGCTTCTGG
CT-------- ---GTTGA-G GGCAAAAGAT GCATCAATAG GTGCTTTTAT
CT-------- ---GTTGA-A GGCAAAAGCT GCATCAATCG GTGCTTTTAT
CT-------- ---GTTGA-A GGAAAAACCT GCATCAACAG GTGTTTTTAT
CCGATGCCAC AGGGGGGA-T ACACAAATCC AAACGAGGAG ATCATTCGAG
CCGATGCCAC AGAGGTGA-T ACACAAATTC AAACTAGAAG ATCATTTGAA
CCGGTGCCAC AGAGGGGA-C ACACAAATTC AGACAAGGAG ATCATTCGAG
TAGGTGCCAT AGAGGAGA-C ACACAAATTC AGACGAGAAG ATCATTCGAG
CCGATGCCAT AGAGGTGA-C ACACAAATAC AAACCCGAAG ATCATTTGAA

TTATGTGAGA ACCAATGGAA CCTCCAAGAT CAAGATGAAA TGGGGCATGG


GTATGTGAGA ACTAACGGAA CCTCCAAAAT TAAGATGAAA TGGGGGATGG
GTATGTGAGG ACAAATGGAA CATCAAAGAT TAAAATGAAA TGGGGAATGG
GTATGTGAGG ACAAATGGAA CCTCAAAAAT TAAAATGAAA TGGGGAATGG
GTATGTGAGG ACAAACGGAA CATCAAAGGT CAAAATGAAA TGGGGAATGG
GGGAATACCG AGGGCAGAAC ATCTGACATG AGGACTGAAA TCATAAGGAT
GGGAATACCG AGGGCAGAAC ATCTGACATG AGGACTGAAA TCATAAGGAT
GGGAATACAG AGGGCAGAAC ATCTGACATG AGGACTGAAA TCATAAGGAT
GGGAATACAG AGGGAAGAAC ATCAGACATG AGGGCAGAAA TCATAAGGAT
GGAAATACGG AGGGAAGGAC TTCAGACATG AGGGCAGAAA TCATAAGAAT
GGGAATACAG AGGGGAGAAC ATCTGACATG AGGACCGAAA TCATAAGGAT
ATCGTTCCAA TCCTTGGTGC CAAAGGCTGC CAGAAGCCAA TATAGTGGAT
ACCATTTCAA TCTTTAGTAC CAAAGGCTGC CAGAAGCCAA TATAGTGGAT
GCCATTTCAG TCTTTAGTTC CTAAGGCCAT TAGAGGCCAA TACAGTGGAT
ACCATTTCAA TCTTTAGTCC CCAAGGCCAT TAGAAGCCAA TACAGTGGGT
ACCATTTCAG TCTTTAGTAC CTAAGGCCAT TAGAGGCCAA TACAGTGGGT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
ACTACCCGCA GTATTCAGAA GAAGCAAGAC TAAACAGAGA GGAAATAAGT
ATTATCCCAA GTATGAAGAA GAATCTAAAC TAAATAGAAA TGAAATCAAA
ATTATCCCAA ATATTCAGAA GAGTCAAAGT TGAACAGGGA AAAGGTAGAT
ATAGGAGAAA GTATAGAGAG GAATCAAGAC TAGAAAGGCA GAAAATAGAG
ACCATGATGT ATACAGAGAT GAAGCATTAA ACAACCGGTT CCAGATCAAA
GTTGAGCTAA TCAGAGGGCG GCCCAAAGAG AGCACAATTT GGACTAGTGG

50
GTTGAATTAA TCAGGGGACG ACCTAAAGAA AAAACAATCT GGACTAGTGC
GTGGAGTTGA TAAGGGGAAG GCAACAGGAG ACTAGAGTAT GGTGGACCTC
GTGGAGTTGA TAAGGGGAAG AAAAGAGGAA ACTGAAGTCT TGTGGACCTC
GTGGAGTTGA TAAGAGGGAG ACCACAGGAG ACCAGAGTAT GGTGGACTTC
CTGAAGAAGC TGTGGGAGCA GACCCGCTCA AAGGCAGGAC TGTTGGTTTC
TTGAAGAAGC TGTGGGAGCA GACCCGCTCA AAGGCAGGAC TGTTGGTTTC
CTAAAGAAGC TGTGGGAGCA AACCCGCTCA AAGGCAGGAC TTTTGGTGTC
CTAAAGAAGC TGTGGGATCA AACCCAATCA AGGGCAGGAC TATTGGTATC
ATAAAGAAAC TGTGGGAGCA AACCCGTTCC AAAGCTGGAC TGCTGGTCTC

AAATGAGGCG ATGCCCTTTT CAATCCCTTC AACAGATTGA GAGCATGATT


AAATGAGACG CTGCCTTCTT CAATCTCTTC AACAGATTGA GAGCATGATC
AGATGAGGCC TTGCCTCCTT CAGTCACTAC AACAAATCGA GAGTATGGTT
AGATGAGGCG TTGTCTCCTC CAGTCACTTC AACAAATTGA GAGTATGATT
AGATGAGACG TTGCCTCCTT CAGTCACTCC AGCAGATCGA GAGCATGATT
GATGGAAAGT GCCAGACCAG AAGATGTGTC TTTCCAGGGG CGGGGAGTCT
GATGGAAAGT GCCAGACCAG AAGATGTGTC TTTCCAGGGG CGGGGAGTCT
GATGGAAAGC TCCAGACCAG AAGATGTGTC TTTCCAGGGG CGGGGAGTCT
GATGGAAGGT GCAAAACCAG AAGAAATGTC CTTCCAGGGG CGGGGAGTCT
GATGGAAGGT GCAAAACCAG AAGAAGTGTC ATTCCGGGGG AGGGGAGTTT
GATGGAAAGT GCAAGACCAG AAGATGTGTC TTTCCAGGGG CGGGGAGTCT
TTGTGAGAAC ACTATTCCAA CAGATGCGTG ATGTTTTGGG ---GACATTT
TTGTGAGAAC GCTATTCCAG CAGATGCGTG ATGTTTTGGG ---AACGTTC
TTGTTAGGAC TCTATTCCAA CAAATGAGGG ATGTACTTGG ---GACATTT
TTGTCAGAAC TCTATTCCAA CAAATGAGAG ACGTACTTGG ---GACATTT
TTGTGAGAAC TCTGTTCCAA CAAATGAGGG ATGTGCTTGG ---GACATTT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GGAGTAAAAT TGGAATCAAT GGGAACTTAC CAAATACTGT CAATTTATTC
GGGGTAAAAT TGAGCAGCAT GGGGGTTTAT CAAATCCTTG CCATTTATGC
GGAGTGAAAT TGGAATCAAT GGGGATCTAT CAGATTCTGG CGATCTACTC
GGGGTTAAGC TGGAATCTGA GGGAACTTAC AAAATCCTCA CCATTTATTC
GGTGTTGAGT TGAAGTCAGG ATACAAAGAT TGGATCCTAT GGATTTCCTT
GAGCAGCATA TCTTTTTGTG GTGTAAATAG TGACAC-TGT GGGTTGGTCT
GAGCAGCATT TCTTTTTGTG GCGTGAATAG TGATAC-TGT AGATTGGTCT
AAACAGTATT GTTGTGTTTT GTGGCACTTC AGGTACTTAT GGAACAGGCT
AAACAGTATT GTTGTGTTTT GTGGCACCTC AGGTACATAT GGAACAGGCT
AAATAGCATC ATTGTATTTT GTGGAACTTC AGGTACCTAT GGAACAGGCT
AGATGGAGGA CCAAACCCAT ACAATATCCG GAATCTCCAC ATTCCGGAGG
AGATGGAGGG CCGAATTTAT ACAACATCCG GAATCTTCAC ATTCCAGAAG
GGATGGAGGA TCAAACTTAT ACAATATCCG GAATCTCCAC ATTCCAGAAG
AGATGGGGGA CCAAACTTAT ACAATATCCG GAACCTTCAC ATCCCTGAAG
CGACGGAGGC CCAAATTTAT ACAACATTAG AAATCTCCAC ATTCCTGAAG

GAGGCCGAGT CTTCTGTCAA AGAAAAAGAC ATGACTAAAG AATTCTTTGA


GAGGCTGAGT CTTCTATCAA AGAGAAAGAC ATGACCAAAG AATTCTTTGA
GAAGCCGAGT CCTCTGTCAA AGAGAAAGAC ATGACCAAAG AGTTTTTTGA
GAAGCTGAGT CCTCTGTCAA AGAGAAAGAC ATGACCAAAG AGTTCTTTGA
GAAGCCGAGT CCTCGATTAA AGAGAAAGAC ATGACCAAAG AGTTTTTTGA
TCGAGCTC-T CGGACGAAAA GGCAACGAAC CCGATCGTGC CTTCCTTTGA
TCGAGCTC-T CGGACGAAAA GGCAACGAAC CCGATCGTGC CTTCCTTTGA
TCGAGCTC-T CGGACGAAAA GGCAACGAAC CCGATCGTGC CTTCCTTTGA
TCGAGCTC-T CGGACGAAAA GGCAACGAAC CCGATCGTGC CCTCTTTTGA
TCGAGCTC-T CAGACGAGAA GGCAACGAAC CCGATCGTGC CCTCTTTTGA
TCGAGCTC-T CGGACGAAAA GGCAGCGAGC CCGATCGTGC CTTCCTTTGA
GATACTGT-C CAAATAATCA AGCTGCTACC ATTTGCAGCA GCCCCACCGG
GACACTGT-T CAAATAATCA AACTACTACC ATTTGCAGCA GCCCCACCGG
GATACCAC-C CAGATAATAA AGCTTCTTCC CTTTGCAGCC GCCCCACCAA
GACACCAC-C CAGATAATAA AGCTTCTCCC TTTTGCAGCC GCTCCACCAA
GATACCGC-A CAGATAATAA AACTTCTTCC CTTCGCAGCC GCTCCACCAA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AACAGTGGCG AGTTCCCTAG CACTGGCAAT CATGGTAGCT GGTCTATCTT
TACAGTAGCA GGTTCTCTGT CACTGGCAAT CATGATGGCT GGGATCTCTT
AACTGTCGCC AGTTCACTGG TGCTTTTGGT CTCCCTGGGG GCAATCAGTT
GACTGTCGCC TCATCTCTTG TGCTTGCAAT GGGGTTTGCT GCCTTCCTGT
TGCCATATCA TGTTTTTT-- -GCTTTGTGT TGCTTTGTTG GGGTTCATCA
--TGGCCAGA -CGATGCCGA GTTGCCATTC ACCATTG-AC AAGT------
--TGGCCAGA -CGGTGCTGA GTTGCCATTC ACCATTG-AC AAGT------

51
CATGGCCTGA -TGGGGCGAA CATCAATTTC ATGCCTATAT AA--------
CATGGCCTGA -TGGGGCGGA CATCAATCTC ATGCCTATAT AAGCTTTCGC
CATGGCCTGA -TGGAGCGAA TATCAATTTC ATGTCTATAT AAGCTTTCGC
CTGGCTTGAA GTGGGAATTG ATGGATGAAG ACTACCAGGG CAGACTGTGT
TTTGCTTGAA GTGGGAGTTG ATGGATGAAG ATTACCAGGG AAGACTGTGT
TCTGCTTGAA ATGGGAGCTA ATGGATGAAG ACTATCAGGG GAGGCTTTGT
TCTGCTTAAA GTGGGAGCTA ATGGATGAGA ATTATCGGGG AAGACTTTGT
TCTGCCTAAA ATGGGAATTG ATGGATGAGG ATTACCAGGG GCGTTTATGC

AAACAAATCA GAAACATGGC CAATTGGAGA ATC-ACCCAA GGGAGTGGAG


AAACAGATCG GAGACATGGC CAATTGGAGA GTC-ACCTAA GGGAGTGGAG
GAATAAATCA GAAACATGGC CCATTGGGGA GTC-CCCCAA AGGAGTGGAA
GAACAAATCA GAAACATGGC CCATTGGAGA GTC-TCCCAA AGGAGTGGAG
GAATAAATCA GAAGCATGGC CCATTGGGGA GTC-CCCCAA GGGAGTGGAA
CATGAGTAAT GAAGGATCTT ATTTCTTCGG AGA-CAATGC AGAGGAATAT
CATGAGTAAT GAAGGATCTT ATTTCTTCGG AGA-CAATGC AGAGGAATAT
CATGAGTAAT GAAGGATCTT ATTTCTTCGG AGA-CAATGC AGAGGAATAT
CATGAGTAAT GAAGGATCTT ATTTCTTCGG AGA-CAATGC AGAGGAGTAC
TATGAGTAAT GAAGGATCTT ATTTCTTCGG AGA-CAATGC AGAAGAGTAC
CATGAGTAAT GAAGGATCTT ATTTCTTCGG AGA-CAATGC AGAGGAGTAC
AGCCGAGCAG AATGCAGTTT TCTTCTCTAA CTG-TGAATG TGAGAGGCTC
AACAGAGTAG GATGCAATTT TCTTCTCTGA CTG-TGAATG TGAGGGGATC
AGCAAAGTAG AATGCAGTTC TCTTCATTGA CTG-TGAATG TGAGGGGATC
AGCAAAGCAG AATGCAGTTC TCTTCACTGA CTG-TAAATG TGAGGGGATC
AGCAAAGTAG AATGCAGTTC TCCTCATTTA CTG-TGAATG TGAGGGGATC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TATGGATGTG CTCCAATGGA TCGTTACAAT GCA-GAATTT GCATTTAAAT
TCTGGATGTG CTCCAACGGG TCTCTGCAGT GCA-GAATCT GCATATGA--
TCTGGATGTG TTCTAATGGA TCTTTGCAGT GCA-GAATAT GCATCTGAGA
TCTGGGCCAT GTCCAATGGA TCTTGCAGAT GCA-ACATTT GTATATAA--
TGTGGGCCTG CCAAAAAGGC AACATTAGGT GCA-ACATTT GCATTTGAG-
AGTTTGTTCA AAAAACTCCT TGTTTCTACT ---------- ----------
AGTCTGTTCA AAAAACTCCT TGTTTCTACT ---------- ----------
---------- ---------- ---------- ---------- ----------
AATTTTAGAA AAAAACTCCT TGTTTCTACT ---------- ----------
AATTTT---- ---------- ---------- ---------- ----------
AATCCTCTGA ACCCGTTTGT TAGTCATAAG GAAATTGAGT CTGTCAACAA
AACCCTCTGA ACCCGTTTGT CAGTCATAAG GAAGTTGAAT CCGTCAACAA
AATCCCCTGA ATCCATTTGT CAGTCATAAG GAAATTGAGT CTGTAAACAA
AACCCCCTGA ATCCCTTTGT CAGCCATAAA GAAATTGAGT CTGTAAACAA
AACCCACTGA ACCCATTTGT CAGCCATAAA GAAATTGAAT CAATGAACAA

GAAGGCTCCA TCGGGAAGGT GTGCAGAACC TTACTGGCTA AATCTGTTTT


GAAGGCTCAA TCGGGAAGGT GTGCAGAACC TTACTAGCAA AATCTGTGTT
GAAGGTTCCA TTGGGAAGGT CTGCAGGACT TTATTAGCCA AGTCGGTATT
GAAAGTTCCA TTGGGAAGGT CTGCAGGACT TTATTAGCAA AGTCGGTATT
GAAGGTTCCA TTGGGAAAGT CTGTAGGACT CTATTGGCTA AGTCAGTGTT
GACAATTGAA GAAAAA-TAC CCTTGTTTCT ACT------- ----------
GACAATTGAG GAAAAA-TAC CCTTGTTTCT A--------- ----------
GACAATTGAA GAAAAA-TAC CCTTGTTTCT ACT------- ----------
GACAATTAA- ---------- ---------- ---------- ----------
GACAATTAAG GAAAAAATAC CCTTGTTTCT ACT------- ----------
GACAATTAAA GAAAAA-TAC CCTTGTTTCT ACT------- ----------
AGGAATGAGA ATACTCGTGA GGGGTAACTC CCCCGTGTTC AACTACAACA
AGGAATGAGA ATACTTGTGA GAGGTAACTC CCCTGCATTT AACTACAACA
AGGAATGAGA ATACTTGTAA GGGGCAATTC TCCTGTATTC AACTACAACA
AGGGATGAGA ATACTTGTAA GGGGCAATTC TCCTGTATTC AACTACAACA
AGGAATGAGA ATACTTGTAA GGGGCAATTC TCCTGTATTC AACTACAACA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TTGTGAGTTC AGAT-TGTAG TTAAAAACAC C--------- ----------
TTGTAAGT-C ATTT-TATAA TTAAAAACAC CCTTGTTTCC TGA-------
TTAGAATTTC AGAAATATGA GGAAAAACAC CCTTGTTTCT ACT-------
---------- ---------- ---------- ---------- ----------
-------TGC AT-----TAA TTAAAAACAC CCTTGTTTCT ACT-------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

52
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TGCTGTGGTA ATGCCAGCTC ATGGCCCAGC CAAGAGCATG GAATATGATG
TGCTGTGGTA ATGCCAGCCC ATGGTCCGGC CAAGAGCATG GAATATGATG
TGCTGTGGTA ATGCCAGCTC ACGGTCCAGC CAAGAGCATG GAATATGATG
TGCTGTAGTG ATGCCAGCCC ACGGTCCAGC CAAAAGTATG GAATATGATG
TGCAGTGATG ATGCCAGCAC ATGGTCCAGC CAAAAACATG GAGTATGATG

CAACAGTCTA TATGCATCTC CACAACTCGA GGGGTTTTCA GCTGAATCAA


CAACAGCCTA TATTCATCTC CACAACTCGA AGGATTTTCA GCTGAATCGA
CAATAGCCTG TATGCATCCC CACAATTAGA AGGATTTTCA GCTGAATCAA
TAACAGCTTG TATGCATCTC CACAACTAGA AGGATTTTCA GCTGAATCAA
CAATAGCCTG TATGCATCAC CACAATTGGA AGGATTTTCA GCGGAGTCAA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
AGGCAACCAA AAGGCTTACA GTCCTCGGAA AGGACGCAGG TGCATTAACA
AGACAACTAA GAGGCTTACA ATACTTGGGA AGGACGCAGG TGCGCTTACA
AGACCACTAA GAGACTAACA ATTCTCGGAA AGGATGCTGG CACTTTAACT
AGACCACTAA AAGACTAACA ATTCTCGGAA AAGATGCCGG CACTTTAATT
AGGCCACGAA GAGACTCACA GTTCTCGGAA AGGATGCTGG CACTTTAACC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CAGTTGCGAC TACACATTCA TGGATTCCCA AGAGGAATCG TTCCATTCTC
CCGTTGCAAC TACACATTCA TGGATTCCCA AGAGAAATCG CTCCATTCTC
CTGTTGCTAC TACACACTCC TGGACCCCTA AGAGGAACCG CTCCATTCTC
CCGTTGCAAC TACACACTCC TGGAATCCCA AGAGGAACCG CTCTATTCTA
CTGTTGCAAC AACACACTCC TGGATCCCCA AAAGAAATCG ATCCATCTTG

GAAAATTGCT TCTCATTGTT CAGGCACTTA GGGACAACCT GGAACCTGGA


GAAAACTACT ACTCATTGTT CAAGCACTTA GGGACAACCT GGAACCTGGA
GAAAACTGCT TCTTGTCGTT CAGGCTCTTA GGGACAATCT TGAACCTGGA
GAAAACTGCT TCTTATCGTT CAGGCTCTTA GGGACAATCT GGAACCTGGG
GAAAACTGCT TCTTGTTGTT CAGGCTCTTA GGGACAACCT CGAACCTGGG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GAAGATCCAG ACGAGGGAAC AGCCGGGGTG GAATCTGCAG TATTGAGGGG
GAGGACCCAG ATGAAGGAAC AGCAGGAGTA GAGTCTGCAG TATTGAGAGG
GAAGACCCAG ATGAAGGCAC ATCCGGAGTG GAGTCCGCTG TTCTGAGAGG
GAAGACCCAG ATGAAAGCAC ATCCGGAGTG GAGTCCGCCG TCTTGAGAGG
GAAGACCCAG ATGAAGGCAC AGCTGGAGTG GAGTCCGCTG TTCTGAGGGG
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

53
---------- ---------- ---------- ---------- ----------
AACACCAGCC AAAGGGGGAT TCTTGAGGAT GAACAGATGT ATCAGAAGTG
AACACTAGCC AAAGGGGAAT TCTTGAGGAT GAACAAATGT ACCAGAAGTG
AACACAAGCC AAAGGGGAAT TCTTGAAGAT GAACAGATGT ATCAGAAGTG
AACACTAGCC AAAGGGGAAT TCTTGAGGAT GAACAGATGT ACCAAAAGTG
AATACAAGTC AAAGAGGAGT ACTTGAAGAT GAACAAATGT ACCAAAGGTG

ACCTTCGATC TTGGGGGGCT ATATGAAGCA ATTGAGGAGT GCCTGATTAA


ACCTTTGATC TTGAAGGGCT ATATGGAGCA ATTGAGGAGT GCCTGATTAA
ACCTTTGATC TTGGGGGGCT ATATGAAGCA ATTGAGGAGT GCCTGATTAA
ACCTTTGATC TTGGGGGGCT ATATGAAGCA ATTGAGGAGT GCCTAATTAA
ACCTTTGATC TCGGGGGGCT ATATGAAGCA ATTGAGGAGT GCCTGATTAA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
ATTCCTAATT CTAGGCAGAG AGGACAAAAG ATATGGACCC GCATTGAGCA
ATTTCTAATC CTCGGCAAAG AAGACAAAAG ATATGGACCA GCATTAAGCA
ATTCCTCATT CTGGGCAAGG AAGATAGAAG ATATGGACCA GCATTAAGCA
GTTTCTCATT ATAGGTAAGG AAGACAGAAG ATACGGACCA GCATTAAGCA
ATTCCTCATT CTGGGCAAAG AAGACAGGAG ATATGGGCCA GCATTAAGCA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
CTGCAATCTA TTCGAGAAAT TCTTCCCTAG CAGTTCATAT CGGAGGCCAG
CTGCACTCTA TTCGAGAAAT TCTTCCCTAG CAGTTCATAT CGGAGGCCAG
TTGCAATCTA TTTGAGAAAT TCTTCCCTAG CAGTTCGTAC AGGAGACCAG
CTGCAACTTG TTCGAGAAAT TTTTCCCTAG TAGTTCATAT AGGAGACCGA
CTGCAATTTA TTTGAAAAAT TCTTCCCCAG CAGTTCATAC AGAAGACCAG

TGATCCCTGG GTTTTGCTTA ATGCATCTTG GTTCAACTCC TTCCTCACAC


TGATCCCTGG GTTTTGCTTA ATGCATCTTG GTTCAACTCC TTCCTCACAC
TGATCCCTGG GTTTTGCTTA ATGCGTCTTG GTTCAACTCC TTCCTAACAC
TGATCCCTGG GTTTTGCTTA ATGCTTCTTG GTTCAACTCC TTCCTTACAC
TGATCCCTGG GTTTTGCTCA ATGCATCTTG GTTCAACTCC TTCCTGACAC
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
TCAATGAACT GAGCAATCTT GCAAAAGGGG AGAAGGCTAA TGTATTGATA
TCAATGAACT GAGCAATCTT ACGAAAGGGG AGAAAGCTAA TGTATTGATA
TCAATGAACT GAGTACCCTT GCAAAAGGAG AAAAGGCTAA TGTACTAATT
TCAATGAACT GAGTAACCTT GCAAAAGGGG AAAAGGCTAA TGTGCTAATC
TCAATGAACT GAGCAACCTT GCGAAAGGAG AGAAGGCTAA TGTGCTAATT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

54
TTGGAATTTC CAGCATGGTG GAGGCCATGG TGTCTAGGGC CCGAATTGAT
TTGGAATTTC CAGCATGATG GAGGCCATGG TGTCTAGGGC CCGAATTGAT
TTGGAATTTC CAGCATGGTG GAGGCCATGG TGTCTAGGGC TCGGATTGAT
TTGGAATTTC TAGCATGGTG GAGGCCATGG TGTCTAGGGC CCGGATTGAT
TCGGGATATC CAGTATGGTG GAGGCTATGG TTTCCAGAGC CCGAATTGAT

ATGCACTAAG A--TAGTTGT GGCAATGCTA CTATTTGCTA TCCATACTGT


ATGCACTAAA A--TAGTTGT GGCAATGCTA CTATTTGCTA TCCATACTGT
ATGCATTAAG A--TAGTTGT GGCAATGCTA CTATTTGCTA TCCATACTGT
ATGCATTGAG T--TAGTTGT GGCAGTGCTA CTATTTGCTA TCCATACTGT
ATGCATTAAA A--TAGTTAT GGCAGTGCTA CTATTTGTTA TCCGTACTGT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
ATGCAAGGAG ACGTGGTGTT GGTAATGAAA CGGAAACGGG ACTTTAGCAT
GGGCAAGGAG ACGTAGTGTT GGTAATGAAA CGGAAACGGG ACTCTAGCAT
GGGCAAGGAG ACGTGGTGTT GGTAATGAAA CGAAAACGGG ACTCTAGCAT
GGGCAAGGAG ACGTGGTGTT GGTAATGAAA CGAAAACGGG ACTCTAGCAT
GGGCAAGGAG ACGTGGTGTT GGTAATGAAA CGAAAACGGG ACTCTAGCAT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GCACGAATTG ACTTCGAGTC TGGAAGGATT AAGAAAGAAG AGTTTGCTGA
GCACGGATTG ACTTCGAGTC TGGAAGGATT AAGAAAGAAG AATTTGCTGA
GCACGGATTG ACTTCGAGTC TGGACGGATT AAGAAAGAGG AGTTCGCTGA
GCCAGAATTG ACTTCGAGTC TGGACGGATT AAGAAGGAAG AGTTCTCTGA
GCACGGATTG ATTTCGAATC TGGAAGGATA AAGAAAGAAG AGTTCACTGA

CCAAAAAAGT ACCTTGTTTC TACT------ ---------- ----------


CCAAAAAAGT ACCTTGTTTC ---------- ---------- ----------
CCAAAAAAGT ACCTTGTTTC TACT------ ---------- ----------
CCAAAAAAGT ACCTTGTTTC TACT------ ---------- ----------
CCAAAAAAGT ACCTTGTTTC TACT------ ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
ACTTACTGAC AGCCAGACAG CGACCAAAAG AATTCGGATG GCCATCAATT
ACTTACTGAC AGCCAGACAG CGACCAAAAG AATTCGGATG GCCATCAATT
ACTTACTGAC AGCCAGACAG CGACCAAAAG AATTCGGATG GCCATCAATT
ACTTACTGAC AGCCAGACAG CGACCAAAAG AATTCGGATG GCCATCAATT
ACTTACTGAC AGCCAGACAG CGACCAAAAG AATTCGGATG GCCATCAATT
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
GATCATGAAG ATCTGTTCCA CCATTGAAGA GCTCGGACGG CAAAAATAGT

55
GATCTTGAAG ATCTGTTCCA CCATTGAAGA GCTCGGACGG CAAGGGAAGT
GATCATGAAG ATCTGTTCCA CCATTGAAGA GCTCAGACGG CAAAAATAGT
GATCATGAAG ATCTGTTCCA CCATTGAAGA ACTCAGACGG CAAAAATAAT
GATCATGAAG ATCTGTTCCA CCATTGAAGA GCTCAGACGG CAAAAATAGT

---------- ---------- ---------- ---------- --


---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
AGTGTTGAAT AGTTTAAAAA CGACCTTGTT TCTACT---- --
AGTGTCGAAT TGTTTAAAAA CGACCTTGTT TCTACT---- --
AATGTTGAAT AGTTTAAAAA CGACCTTGTT TCTACT---- --
AATGTTGAAT AGTTTAAAAA CGACCTTGTT TCTACT---- --
AGTGTCGAAT AGTTTAAAAA CGACCTTGTT TCTACT---- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
---------- ---------- ---------- ---------- --
GAATTTAGCT TGTCCTTCAT GAAAAAATGC CTTGTTTCTA CT
GAATTTGGCT TGTCCTTCAT GAAAAAATGC ---------- --
GAATTTAGCT TGTCCTTCAT GAAAAAATGC CTTGTTTCTA CT
GAATTTAGCT TGTCCTTCAT GAAAAAATGC CTTGTTTCTA CT
GAATTTAGCT TGTCCTTCAT GAAAAAATGC CTTGTTCCTA CT

Output of Dnadist method

41

gi|7385295
0.0000 0.1063 0.1753 0.1641 0.1931 2.3471 2.3223 2.3333
2.3923 2.3492 2.4457 2.4589 2.4414 2.4361 2.4361 2.4175 2.1281
2.1334 2.1353 2.1765 2.4725 2.3944 2.4102 2.3769 2.3570 2.3404
3.4066 3.1170 3.3324 3.3557 3.9585 3.7485 3.8024 3.4048 3.3406
3.4574 3.6922 3.6989 3.7171 3.5242 3.9746
gi|3214017
0.1063 0.0000 0.1819 0.1662 0.1978 2.2872 2.2655 2.2463
2.3663 2.3324 2.3903 2.4156 2.3836 2.3723 2.3888 2.3549 2.1077
2.1311 2.1038 2.1778 2.3779 2.3026 2.3312 2.3061 2.3164 2.2842
3.2983 3.0012 3.2202 3.3403 3.8456 3.6957 3.7325 3.3231 3.2845
3.4185 3.5965 3.6382 3.6024 3.4367 3.9079
gi|7391268
0.1753 0.1819 0.0000 0.0635 0.0732 2.3310 2.3093 2.2899
2.3079 2.2873 2.3792 2.3645 2.3543 2.3290 2.3290 2.3346 2.0488
2.0537 2.0979 2.0716 2.4847 2.1932 2.2132 2.2194 2.1773 2.1622
3.1839 2.9733 3.2670 3.3618 3.6743 3.4250 3.3827 3.3130 3.2729
3.4077 3.6756 3.6942 3.6854 3.5127 3.9462
gi|8486136
0.1641 0.1662 0.0635 0.0000 0.1143 2.2752 2.2540 2.2418
2.2874 2.2447 2.3460 2.3970 2.3782 2.3763 2.3879 2.3699 2.0142
2.0117 2.0652 2.0141 2.4847 2.2457 2.2457 2.2790 2.2420 2.1932
3.2076 2.9808 3.3757 3.3544 3.7509 3.4675 3.4940 3.4097 3.4056
3.5080 3.6887 3.6989 3.6968 3.5370 4.0509
gi|7391913
0.1931 0.1978 0.0732 0.1143 0.0000 2.3164 2.2949 2.2982
2.2653 2.2545 2.3346 2.3900 2.3915 2.3745 2.3763 2.3722 2.0646
2.0454 2.1000 2.0535 2.5042 2.1984 2.2083 2.2405 2.2301 2.1671
3.3522 3.0557 3.2515 3.3894 3.6161 3.3418 3.3619 3.2329 3.1764

56
3.2901 3.7411 3.8351 3.7548 3.5715 4.1240
gi|3214015
2.3471 2.2872 2.3310 2.2752 2.3164 0.0000 0.0026 0.0604
0.2072 0.2188 0.1729 1.8735 1.8961 1.8326 1.8229 1.8459 1.7862
1.8365 1.8321 1.9505 1.9020 2.5268 2.4282 2.3710 2.3772 2.4389
3.2524 3.5546 2.8988 3.2747 4.0283 3.7818 4.1020 3.5122 3.2693
3.4376 3.3673 3.3846 3.2598 3.4325 3.6583
gi|9316315
2.3223 2.2655 2.3093 2.2540 2.2949 0.0026 0.0000 0.0629
0.2081 0.2185 0.1773 1.8978 1.9210 1.8593 1.8535 1.8706 1.7862
1.8365 1.8321 1.9505 1.9020 2.5268 2.4282 2.3710 2.3772 2.4389
3.2085 3.5254 2.8775 3.2445 3.9636 3.7506 4.0654 3.5122 3.2480
3.4376 3.3452 3.3622 3.2386 3.4065 3.6268

gi|7385295
2.3333 2.2463 2.2899 2.2418 2.2982 0.0604 0.0629 0.0000
0.1857 0.1931 0.1538 1.8691 1.8946 1.8198 1.8043 1.8482 1.8428
1.9098 1.9150 2.0378 1.9624 2.5193 2.4126 2.3697 2.3658 2.4131
3.2091 3.3703 2.8235 3.1243 3.7117 3.7628 4.0000 3.4275 3.2102
3.3757 3.2474 3.2770 3.1404 3.3142 3.5072
gi|7392130
2.3923 2.3663 2.3079 2.2874 2.2653 0.2072 0.2081 0.1857
0.0000 0.0840 0.0854 1.7959 1.7853 1.7906 1.8189 1.8426 1.7581
1.8321 1.8591 1.9264 1.9050 2.3097 2.2709 2.2180 2.2387 2.2348
3.4449 3.4819 3.0244 3.2002 3.8465 3.5716 3.8485 3.2679 3.1513
3.2696 3.3755 3.4057 3.3646 3.5577 3.8782
gi|7391914
2.3492 2.3324 2.2873 2.2447 2.2545 0.2188 0.2185 0.1931
0.0840 0.0000 0.1316 1.8751 1.9080 1.8509 1.8905 1.8919 1.8452
1.9098 1.9579 1.9903 2.0053 2.2506 2.2334 2.2124 2.2378 2.2339
3.4797 3.6735 3.0447 3.2764 3.8861 3.6433 3.9515 3.2690 3.1515
3.1859 3.5497 3.5460 3.5616 3.7566 3.8968
gi|8486129
2.4457 2.3903 2.3792 2.3460 2.3346 0.1729 0.1773 0.1538
0.0854 0.1316 0.0000 1.9223 1.9150 1.8833 1.9165 1.9450 1.7786
1.8124 1.9045 1.9465 1.9350 2.2993 2.2342 2.1553 2.1970 2.2077
3.4354 3.6329 2.9941 3.2301 3.7635 3.6555 3.9188 3.3270 3.1548
3.4501 3.5787 3.6013 3.5044 3.6913 3.9760
gi|7385294
2.4589 2.4156 2.3645 2.3970 2.3900 1.8735 1.8978 1.8691
1.7959 1.8751 1.9223 0.0000 0.1329 0.1899 0.1958 0.1861 1.8359
1.7804 1.8717 1.7894 1.6858 2.3617 2.3911 2.4332 2.5037 2.4545
3.4453 2.9944 3.2243 3.2154 3.6537 3.8403 3.7811 3.5032 3.4476
3.6604 3.3555 3.3149 3.3699 3.2953 3.1628
gi|3214016
2.4414 2.3836 2.3543 2.3782 2.3915 1.8961 1.9210 1.8946
1.7853 1.9080 1.9150 0.1329 0.0000 0.1985 0.2048 0.1866 1.8219
1.7786 1.8343 1.7768 1.6980 2.3391 2.3485 2.4835 2.5265 2.4560
3.4125 2.9268 3.2629 3.0825 3.4834 4.0980 3.8438 3.4661 3.4527
3.7056 3.2613 3.2418 3.2920 3.2639 3.1153
gi|7391882
2.4361 2.3723 2.3290 2.3763 2.3745 1.8326 1.8593 1.8198
1.7906 1.8509 1.8833 0.1899 0.1985 0.0000 0.0718 0.0771 1.7394
1.6830 1.7922 1.7328 1.6304 2.3199 2.3441 2.4214 2.4810 2.4421
3.2508 2.8484 2.9419 3.0785 3.3668 3.6878 3.5423 3.4478 3.4432
3.5653 3.4535 3.4243 3.4878 3.5116 3.3345
gi|7391905
2.4361 2.3888 2.3290 2.3879 2.3763 1.8229 1.8535 1.8043
1.8189 1.8905 1.9165 0.1958 0.2048 0.0718 0.0000 0.1157 1.7256
1.6556 1.7738 1.6841 1.6127 2.2979 2.3320 2.3968 2.4595 2.4125
3.1872 2.8279 2.8438 3.0565 3.4205 3.7099 3.5393 3.5158 3.5048
3.5986 3.4109 3.3748 3.4148 3.4229 3.3269
gi|8486138
2.4175 2.3549 2.3346 2.3699 2.3722 1.8459 1.8706 1.8482
1.8426 1.8919 1.9450 0.1861 0.1866 0.0771 0.1157 0.0000 1.7464
1.6892 1.8241 1.7445 1.6850 2.3446 2.3642 2.4330 2.4839 2.4675
3.3591 2.8793 3.0757 3.1889 3.4216 3.5794 3.4283 3.3558 3.3524
3.4877 3.3955 3.3630 3.4362 3.4367 3.2463
gi|7392156
2.1281 2.1077 2.0488 2.0142 2.0646 1.7862 1.7862 1.8428
1.7581 1.8452 1.7786 1.8359 1.8219 1.7394 1.7256 1.7464 0.0000
0.0672 0.0791 0.1454 0.4088 1.9652 2.0763 1.9867 2.0224 2.0344
2.7992 2.6308 2.6861 3.4638 3.2188 3.7354 3.3783 3.5087 3.4517
3.3484 3.2180 3.0730 3.3767 3.2010 3.3079
gi|7391921
2.1334 2.1311 2.0537 2.0117 2.0454 1.8365 1.8365 1.9098
1.8321 1.9098 1.8124 1.7804 1.7786 1.6830 1.6556 1.6892 0.0672
0.0000 0.1177 0.1624 0.3727 1.9452 2.0526 1.9835 2.0339 2.0339
2.7025 2.4562 2.5497 3.3621 3.0388 3.8921 3.5995 3.4530 3.4882
3.3117 3.0655 2.9303 3.2099 3.0375 3.2489
gi|8486131
2.1353 2.1038 2.0979 2.0652 2.1000 1.8321 1.8321 1.9150
1.8591 1.9579 1.9045 1.8717 1.8343 1.7922 1.7738 1.8241 0.0791
0.1177 0.0000 0.1252 0.3973 2.0882 2.1954 2.1324 2.1609 2.1728
2.8688 2.6256 2.7224 3.3832 3.4314 4.2119 3.7273 3.6344 3.6545

57
3.4832 3.1278 3.0100 3.3041 3.1280 3.4143

gi|3214016
2.1765 2.1778 2.0716 2.0141 2.0535 1.9505 1.9505 2.0378
1.9264 1.9903 1.9465 1.7894 1.7768 1.7328 1.6841 1.7445 0.1454
0.1624 0.1252 0.0000 0.4015 2.0495 2.1645 2.1431 2.1843 2.1541
2.9173 2.7117 2.7484 3.3454 3.3862 4.5655 3.9304 3.9352 3.7668
3.7097 3.3456 3.1959 3.5743 3.4190 3.7992
gi|7385294
2.4725 2.3779 2.4847 2.4847 2.5042 1.9020 1.9020 1.9624
1.9050 2.0053 1.9350 1.6858 1.6980 1.6304 1.6127 1.6850 0.4088
0.3727 0.3973 0.4015 0.0000 2.1572 2.2267 2.2642 2.2859 2.2775
3.1556 2.6173 3.0914 3.5086 3.3677 3.8865 4.2283 3.4490 3.3421
3.4440 2.8189 2.7691 2.8591 2.7993 3.0417
gi|7385295
2.3944 2.3026 2.1932 2.2457 2.1984 2.5268 2.5268 2.5193
2.3097 2.2506 2.2993 2.3617 2.3391 2.3199 2.2979 2.3446 1.9652
1.9452 2.0882 2.0495 2.1572 0.0000 0.0786 0.1123 0.1362 0.1063
4.8457 3.5818 3.5454 4.2749 4.3292 5.3580 5.6194 4.9122 4.8318
4.5861 4.5477 4.3458 4.3820 5.0460 4.4859
gi|3214142
2.4102 2.3312 2.2132 2.2457 2.2083 2.4282 2.4282 2.4126
2.2709 2.2334 2.2342 2.3911 2.3485 2.3441 2.3320 2.3642 2.0763
2.0526 2.1954 2.1645 2.2267 0.0786 0.0000 0.1233 0.1396 0.1160
4.4229 3.3648 3.6261 4.2044 4.2963 5.4914 5.7825 5.6269 5.4956
5.1057 4.6878 4.4745 4.5093 5.2488 4.6878
gi|7391268
2.3769 2.3061 2.2194 2.2790 2.2405 2.3710 2.3710 2.3697
2.2180 2.2124 2.1553 2.4332 2.4835 2.4214 2.3968 2.4330 1.9867
1.9835 2.1324 2.1431 2.2642 0.1123 0.1233 0.0000 0.0525 0.0567
5.3095 3.8582 3.7704 4.5809 4.8015 5.8650 6.0600 5.0782 4.9995
4.8603 4.7422 4.6551 4.6975 5.3408 4.9884
gi|7391915
2.3570 2.3164 2.1773 2.2420 2.2301 2.3772 2.3772 2.3658
2.2387 2.2378 2.1970 2.5037 2.5265 2.4810 2.4595 2.4839 2.0224
2.0339 2.1609 2.1843 2.2859 0.1362 0.1396 0.0525 0.0000 0.0779
5.1091 3.7759 3.8247 4.2165 4.6605 5.5875 5.9400 4.7465 4.8260
4.5831 4.6156 4.4064 4.4469 4.9537 4.6771
gi|8486122
2.3404 2.2842 2.1622 2.1932 2.1671 2.4389 2.4389 2.4131
2.2348 2.2339 2.2077 2.4545 2.4560 2.4421 2.4125 2.4675 2.0344
2.0339 2.1728 2.1541 2.2775 0.1063 0.1160 0.0567 0.0779 0.0000
4.7002 3.6553 3.6434 4.0542 4.4690 5.7112 6.0600 4.6885 4.6312
4.5197 4.6864 4.4694 4.5141 5.0517 4.7604
gi|7385295
3.4066 3.2983 3.1839 3.2076 3.3522 3.2524 3.2085 3.2091
3.4449 3.4797 3.4354 3.4453 3.4125 3.2508 3.1872 3.3591 2.7992
2.7025 2.8688 2.9173 3.1556 4.8457 4.4229 5.3095 5.1091 4.7002
0.0000 0.4017 0.5474 0.7250 0.9789 4.1633 3.8804 3.0973 3.2133
3.2358 3.7356 3.5969 3.6391 3.7716 3.6338
gi|7391914
3.1170 3.0012 2.9733 2.9808 3.0557 3.5546 3.5254 3.3703
3.4819 3.6735 3.6329 2.9944 2.9268 2.8484 2.8279 2.8793 2.6308
2.4562 2.6256 2.7117 2.6173 3.5818 3.3648 3.8582 3.7759 3.6553
0.4017 0.0000 0.5016 0.7396 0.9167 3.8304 3.9581 3.0471 3.1679
3.2251 3.3705 3.2747 3.2685 3.3578 3.3159
gi|8486125
3.3324 3.2202 3.2670 3.3757 3.2515 2.8988 2.8775 2.8235
3.0244 3.0447 2.9941 3.2243 3.2629 2.9419 2.8438 3.0757 2.6861
2.5497 2.7224 2.7484 3.0914 3.5454 3.6261 3.7704 3.8247 3.6434
0.5474 0.5016 0.0000 0.7361 0.8937 3.3880 3.4460 2.9635 2.9855
3.1497 3.4162 3.3643 3.2892 3.3775 3.3498
gi|3214016
3.3557 3.3403 3.3618 3.3544 3.3894 3.2747 3.2445 3.1243
3.2002 3.2764 3.2301 3.2154 3.0825 3.0785 3.0565 3.1889 3.4638
3.3621 3.3832 3.3454 3.5086 4.2749 4.2044 4.5809 4.2165 4.0542
0.7250 0.7396 0.7361 0.0000 0.9653 3.9915 3.8362 3.1780 3.0869
3.3141 3.6510 3.4541 3.3768 3.3963 3.5212
gi|7391920
3.9585 3.8456 3.6743 3.7509 3.6161 4.0283 3.9636 3.7117
3.8465 3.8861 3.7635 3.6537 3.4834 3.3668 3.4205 3.4216 3.2188
3.0388 3.4314 3.3862 3.3677 4.3292 4.2963 4.8015 4.6605 4.4690
0.9789 0.9167 0.8937 0.9653 0.0000 3.5835 3.6419 3.6494 3.7049
3.7925 3.7010 3.5998 3.4375 3.3912 3.7110
gi|7392126
3.7485 3.6957 3.4250 3.4675 3.3418 3.7818 3.7506 3.7628
3.5716 3.6433 3.6555 3.8403 4.0980 3.6878 3.7099 3.5794 3.7354
3.8921 4.2119 4.5655 3.8865 5.3580 5.4914 5.8650 5.5875 5.7112
4.1633 3.8304 3.3880 3.9915 3.5835 0.0000 0.2097 0.9395 0.9020
0.8816 1.8960 1.8747 1.8492 1.7900 1.8792
gi|8486127
3.8024 3.7325 3.3827 3.4940 3.3619 4.1020 4.0654 4.0000
3.8485 3.9515 3.9188 3.7811 3.8438 3.5423 3.5393 3.4283 3.3783
3.5995 3.7273 3.9304 4.2283 5.6194 5.7825 6.0600 5.9400 6.0600
3.8804 3.9581 3.4460 3.8362 3.6419 0.2097 0.0000 0.9753 0.9052

58
0.9255 1.8904 1.8394 1.8416 1.7969 1.8803
gi|7392130
3.4048 3.3231 3.3130 3.4097 3.2329 3.5122 3.5122 3.4275
3.2679 3.2690 3.3270 3.5032 3.4661 3.4478 3.5158 3.3558 3.5087
3.4530 3.6344 3.9352 3.4490 4.9122 5.6269 5.0782 4.7465 4.6885
3.0973 3.0471 2.9635 3.1780 3.6494 0.9395 0.9753 0.0000 0.1261
0.1695 1.7832 1.7161 1.8218 1.7742 1.8920
gi|7391913
3.3406 3.2845 3.2729 3.4056 3.1764 3.2693 3.2480 3.2102
3.1513 3.1515 3.1548 3.4476 3.4527 3.4432 3.5048 3.3524 3.4517
3.4882 3.6545 3.7668 3.3421 4.8318 5.4956 4.9995 4.8260 4.6312
3.2133 3.1679 2.9855 3.0869 3.7049 0.9020 0.9052 0.1261 0.0000
0.2151 1.8357 1.7788 1.8401 1.7820 1.9628
gi|3214016
3.4574 3.4185 3.4077 3.5080 3.2901 3.4376 3.4376 3.3757
3.2696 3.1859 3.4501 3.6604 3.7056 3.5653 3.5986 3.4877 3.3484
3.3117 3.4832 3.7097 3.4440 4.5861 5.1057 4.8603 4.5831 4.5197
3.2358 3.2251 3.1497 3.3141 3.7925 0.8816 0.9255 0.1695 0.2151
0.0000 1.6716 1.6186 1.6838 1.6851 1.8115
gi|7385294
3.6922 3.5965 3.6756 3.6887 3.7411 3.3673 3.3452 3.2474
3.3755 3.5497 3.5787 3.3555 3.2613 3.4535 3.4109 3.3955 3.2180
3.0655 3.1278 3.3456 2.8189 4.5477 4.6878 4.7422 4.6156 4.6864
3.7356 3.3705 3.4162 3.6510 3.7010 1.8960 1.8904 1.7832 1.8357
1.6716 0.0000 0.0715 0.1065 0.1524 0.1954
gi|3214016
3.6989 3.6382 3.6942 3.6989 3.8351 3.3846 3.3622 3.2770
3.4057 3.5460 3.6013 3.3149 3.2418 3.4243 3.3748 3.3630 3.0730
2.9303 3.0100 3.1959 2.7691 4.3458 4.4745 4.6551 4.4064 4.4694
3.5969 3.2747 3.3643 3.4541 3.5998 1.8747 1.8394 1.7161 1.7788
1.6186 0.0715 0.0000 0.1206 0.1522 0.1953
gi|7391268
3.7171 3.6024 3.6854 3.6968 3.7548 3.2598 3.2386 3.1404
3.3646 3.5616 3.5044 3.3699 3.2920 3.4878 3.4148 3.4362 3.3767
3.2099 3.3041 3.5743 2.8591 4.3820 4.5093 4.6975 4.4469 4.5141
3.6391 3.2685 3.2892 3.3768 3.4375 1.8492 1.8416 1.8218 1.8401
1.6838 0.1065 0.1206 0.0000 0.1089 0.1903
gi|7391914
3.5242 3.4367 3.5127 3.5370 3.5715 3.4325 3.4065 3.3142
3.5577 3.7566 3.6913 3.2953 3.2639 3.5116 3.4229 3.4367 3.2010
3.0375 3.1280 3.4190 2.7993 5.0460 5.2488 5.3408 4.9537 5.0517
3.7716 3.3578 3.3775 3.3963 3.3912 1.7900 1.7969 1.7742 1.7820
1.6851 0.1524 0.1522 0.1089 0.0000 0.2047
gi|8486134
3.9746 3.9079 3.9462 4.0509 4.1240 3.6583 3.6268 3.5072
3.8782 3.8968 3.9760 3.1628 3.1153 3.3345 3.3269 3.2463 3.3079
3.2489 3.4143 3.7992 3.0417 4.4859 4.6878 4.9884 4.6771 4.7604
3.6338 3.3159 3.3498 3.5212 3.7110 1.8792 1.8803 1.8920 1.9628
1.8115 0.1954 0.1953 0.1903 0.2047 0.0000

Output tree of neighbor method

41 Populations
Neighbor-Joining/UPGMA method version 3.573c
Neighbor-joining method
Negative branch lengths allowed

+gi|3214017
!
! +gi|7391268
! +-19
! +-20 +gi|7391913
! ! !
! ! +gi|8486136
! !
! ! +gi|7385295
! ! +-16
! ! ! +gi|3214142
! ! +------------17
! ! ! ! +gi|7391268
! ! ! ! +-14
! ! ! +-15 +gi|7391915
! ! ! !
! ! ! +gi|8486122
-18-21 !
! ! ! +gi|7385294
! ! ! +-23
! ! ! ! +gi|3214016
! ! ! +-------26

59
! ! ! ! ! +gi|7391882
! ! ! ! ! +-24
! ! ! ! +-25 +gi|8486138
! ! ! ! !
! ! ! ! +gi|7391905
! ! ! !
! ! ! ! +gi|7392156
! ! ! ! +-28
! ! ! ! ! ! +gi|8486131
! +----------31 +-33 +-29 +-27
! ! ! ! ! ! +gi|3214016
! ! ! ! +------30 !
! ! ! ! ! ! +gi|7391921
! ! ! ! ! !
! ! ! ! ! +--gi|7385294
! ! ! ! !
! ! ! ! ! +---gi|7385295
! ! ! ! ! +--9
! ! ! ! ! ! +gi|7391914
! ! ! +-32 +-11
! ! ! ! ! ! +---gi|3214016
! ! ! ! +------------12 +-10
! ! ! ! ! ! +-------gi|7391920
! ! ! ! ! !
! ! ! ! ! +-gi|8486125
! ! ! ! !
! ! ! ! ! +gi|7392126
! ! ! ! ! +------1
! ! ! +------22 ! +gi|8486127
! ! ! ! +------8
! +-34 ! ! ! +gi|7392130
! ! ! ! ! +--2
! ! ! ! +--3 +gi|7391913
! ! ! ! !
! ! ! ! +gi|3214016
! ! +----------13
! ! ! +gi|7385294
! ! ! +--5
! ! ! ! ! +gi|7391914
! ! ! +--6 +--4
! ! ! ! ! +-gi|8486134
! ! +-------7 !
! ! ! +gi|7391268
! ! !
! ! +gi|3214016
! !
! ! +gi|3214015
! ! +-35
! ! +-36 +gi|9316315
! ! ! !
! ! ! +gi|7385295
! +--------37
! ! +gi|7392130
! ! +-38
! +-39 +gi|7391914
! !
! +gi|8486129
!
+gi|7385295

remember: (although rooted by outgroup) this is an unrooted tree!

Between And Length


------- --- ------
18 gi|3214017 0.03154
18 21 0.06092
21 20 0.01757
20 19 0.01126
19 gi|7391268 0.03236
19 gi|7391913 0.04084
20 gi|8486136 0.04104
21 31 0.99624
31 17 1.13561
17 16 0.01035
16 gi|7385295 0.02910
16 gi|3214142 0.04950
17 15 0.03620
15 14 0.01454
14 gi|7391268 0.02217
14 gi|7391915 0.03033
15 gi|8486122 0.02651
31 34 0.26861
34 33 0.07196
33 26 0.78395
26 23 0.05711

60
23 gi|7385294 0.06604
23 gi|3214016 0.06686
26 25 0.02485
25 24 0.01715
24 gi|7391882 0.02677
24 gi|8486138 0.05033
25 gi|7391905 0.03805
33 32 0.06004
32 30 0.61962
30 29 0.09053
29 28 0.01212
28 gi|7392156 0.00981
28 27 0.03984
27 gi|8486131 0.03722
27 gi|3214016 0.08798
29 gi|7391921 0.03538
30 gi|7385294 0.24606
32 22 0.62308
22 12 1.17380
12 11 0.08424
11 9 0.08769
9 gi|7385295 0.36795
9 gi|7391914 0.03375
11 10 0.06886
10 gi|3214016 0.33587
10 gi|7391920 0.62943
12 gi|8486125 0.16543
22 13 1.01545
13 8 0.57819
8 1 0.57132
1 gi|7392126 0.09325
1 gi|8486127 0.11645
8 3 0.14471
3 2 0.03457
2 gi|7392130 0.08332
2 gi|7391913 0.04278
3 gi|3214016 0.09468
13 7 0.69229
7 6 0.03131
6 5 0.01490
5 gi|7385294 0.04241
5 4 0.02914
4 gi|7391914 0.06240
4 gi|8486134 0.14230
6 gi|7391268 0.02620
7 gi|3214016 0.02627
34 37 0.82306
37 36 0.07466
36 35 0.04020
35 gi|3214015 0.00078
35 gi|9316315 0.00182
36 gi|7385295 0.02015
37 39 0.02873
39 38 0.01388
38 gi|7392130 0.02606
38 gi|7391914 0.05794
39 gi|8486129 0.05262
18 gi|7385295 0.07476

Output of consense
Majority-rule and strict consensus tree program, version 3.573c

Species in order:

gi|3214017
gi|7391268
gi|7391913
gi|8486136
gi|7385295
gi|3214142
gi|7391268
gi|7391915
gi|8486122
gi|7385294
gi|3214016
gi|7391882
gi|8486138
gi|7391905
gi|7392156
gi|8486131
gi|3214016
gi|7391921
gi|7385294
gi|7385295

61
gi|7391914
gi|3214016
gi|7391920
gi|8486125
gi|7392126
gi|8486127
gi|7392130
gi|7391913
gi|3214016
gi|7385294
gi|7391914
gi|8486134
gi|7391268
gi|3214016
gi|3214015
gi|9316315
gi|7385295
gi|7392130
gi|7391914
gi|8486129
gi|7385295

Sets included in the consensus tree

Set (species in order) How many times out of 1.00

....****** ********** ********** ********** . 1.00


.......... ....****.. .......... .......... . 1.00
....*****. .......... .......... .......... . 1.00
.......... .........* ***....... .......... . 1.00
.......... ....***... .......... .......... . 1.00
.......... .......... .........* **........ . 1.00
.......... .***...... .......... .......... . 1.00
.......... .......... .**....... .......... . 1.00
.***...... .......... .......... .......... . 1.00
......**.. .......... .......... .......... . 1.00
....**.... .......... .......... .......... . 1.00
.********* ********** ********** ********** . 1.00
.......... .......... ....**.... .......... . 1.00
.......... .......... ......**.. .......... . 1.00
.......... .**....... .......... .......... . 1.00
.**....... .......... .......... .......... . 1.00
.......... .....**... .......... .......... . 1.00
.........* ********** ********** ****...... . 1.00
.......... .......... .........* ***....... . 1.00
.......... .......... .......... ....****** . 1.00
.......... .......... ......***. .......... . 1.00
.......... ....****** ********** ****...... . 1.00
.........* *......... .......... .......... . 1.00
.......... .......... .......... ....***... . 1.00
.......... .......... .......... .......**. . 1.00
.......... .........* ********** ****...... . 1.00
......***. .......... .......... .......... . 1.00
.......... .........* *......... .......... . 1.00
.......... .......... .......... ....**.... . 1.00
.......... .........* ****...... .......... . 1.00
.......... .......... .......... **........ . 1.00
.......... ....*****. .......... .......... . 1.00
.........* ****...... .......... .......... . 1.00
.......... .......... ....****** ****...... . 1.00
.......... .......... .......... .......*** . 1.00
.......... .......... ....*****. .......... . 1.00
.........* ********** ********** ********** . 1.00
.......... .......... .........* ****...... . 1.00

Sets NOT included in consensus tree: NONE

CONSENSUS TREE:
the numbers at the forks indicate the number
of times the group consisting of the species
which are to the right of that fork occurred
among the trees, out of 1.00 trees

+----gi|8486131
+--1.0
+--1.0 +----gi|3214016
! !
+--1.0 +---------gi|7392156
! !
+------------1.0 +--------------gi|7391921
! !
! +-------------------gi|7385294

62
!
! +----gi|7391914
! +--1.0
! ! +----gi|7385295
! +--1.0
+--1.0 ! ! +----gi|3214016
! ! +------------1.0 +--1.0
! ! ! ! +----gi|7391920
! ! ! !
! ! ! +--------------gi|8486125
! ! !
! ! ! +---------gi|3214016
! ! ! +--1.0
! +--1.0 ! ! +----gi|7391913
! ! ! +--1.0
! ! +-------1.0 +----gi|7392130
! ! ! !
! ! ! ! +----gi|7392126
! ! ! +-------1.0
! ! ! +----gi|8486127
+--1.0 +--1.0
! ! ! +--------------gi|7391268
! ! ! !
! ! ! +--1.0 +----gi|7391914
! ! ! ! ! +--1.0
! ! ! ! +--1.0 +----gi|8486134
! ! +--1.0 !
! ! ! +---------gi|7385294
! ! !
! ! +-------------------gi|3214016
! !
! ! +----gi|7391882
+--1.0 ! +--1.0
! ! ! +--1.0 +----gi|8486138
! ! ! ! !
! ! +----------------------1.0 +---------gi|7391905
! ! !
! ! ! +----gi|3214016
! ! +-------1.0
! ! +----gi|7385294
! !
! ! +----gi|7391914
! ! +--1.0
! ! +--1.0 +----gi|7392130
+--1.0 ! ! !
! ! +---------------------------1.0 +---------gi|8486129
! ! !
! ! ! +---------gi|7385295
! ! +--1.0
! ! ! +----gi|9316315
! ! +--1.0
! ! +----gi|3214015
! !
! ! +----gi|7391268
+--1.0 ! +--1.0
! ! ! +--1.0 +----gi|7391915
! ! ! ! !
! ! +--------------------------------1.0 +---------gi|8486122
! ! !
! ! ! +----gi|3214142
! ! +-------1.0
! ! +----gi|7385295
! !
! ! +----gi|7391268
! ! +--1.0
! +------------------------------------------1.0 +----gi|7391913
! !
! +---------gi|8486136
!
+-----------------------------------------------------------gi|3214017
!
+-----------------------------------------------------------gi|7385295

63
Neighbor tree with branch length:

64
0 .2
g i |3 2 1 4 0 1 6
g i |7 3 9 1 2 6 8
g i |8 4 8 6 1 3 4
g i |7 3 9 1 9 1 4
g i |7 3 8 5 2 9 4
g i |3 2 1 4 0 1 6
g i |7 3 9 1 9 1 3
g i |7 3 9 2 1 3 0
g i |8 4 8 6 1 2 7
g i |7 3 9 2 1 2 6
g i |7 3 8 5 2 9 5
g i |3 2 1 4 0 1 7
g i |8 4 8 6 1 3 6
g i |7 3 9 1 9 1 3
g i |7 3 9 1 2 6 8
g i |8 4 8 6 1 2 2
g i |7 3 9 1 9 1 5
g i |7 3 9 1 2 6 8
g i |3 2 1 4 1 4 2
g i |7 3 8 5 2 9 5
g i |8 4 8 6 1 2 9
g i |7 3 9 1 9 1 4
g i |7 3 9 2 1 3 0
g i |7 3 8 5 2 9 5
g i |9 3 1 6 3 1 5
g i |3 2 1 4 0 1 5
g i |7 3 9 1 9 0 5
g i |8 4 8 6 1 3 8
g i |7 3 9 1 8 8 2
g i |3 2 1 4 0 1 6
g i |7 3 8 5 2 9 4
g i |7 3 8 5 2 9 4
g i |7 3 9 1 9 2 1
g i |3 2 1 4 0 1 6
g i |8 4 8 6 1 3 1
g i |7 3 9 2 1 5 6
g i |8 4 8 6 1 2 5
g i |7 3 9 1 9 2 0
g i |3 2 1 4 0 1 6
g i |7 3 9 1 9 1 4
g i |7 3 8 5 2 9 5

Family Analysis

We also performed Family analysis in our current work in order to find out no. of base pairs present, exons, introns, orfs present in the sequences of these different
strains.

Result of GETORF tool

>NC_007359.1_1 [25 - 2172] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence

65
MEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESTIIESGDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGVE
KPKFLPDLYDYKENRFIEIGVTRREVHTYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETVEE
RFEITGTMCRLADQSLPPNFSSLEKFRAYVDGFEPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYD
AIKCMKTFFGWKEPNIVKPHEKGINPNYLLAWKQVLAELQDIENEEKIPKTKNMRKTSQLKWALGENMAPEKVDFEDCKDVSDLRQYDSDEPKPRSLA
SWIQSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFLIK
GRSHLRNDTDVVNFVSMEFSLTDPRLEPHRWEKYCVLRIGDMLLRTEIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCPFQSLQQIESMIEAESSVKEK
DMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASW
FNSFLTHALR

>NC_007359.1_2 [1706 - 1359] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence
MGLDTWPISVRKSMSPIRRTQYFSHLCGSSLGSVRENSILTKFTTSVSFLKWDLPFIRNPYRLVFRLPSLVLHLLIIGISWKSSMAAQDAFNKAVFMYTPFI
MYSVALQWDTSAVK
>NC_007359.1_3 [1337 - 1026] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence
MCSIGATSSPISSSSIQLESVNSHALLNSLWIQLASDLGFGSSLSYCLRSLTSLQSSKSTFSGAIFSPSAHFNWLVFLMFFVFGIFSSFSISWSSASTCFQARR
>NC_007359.1_4 [311 - 3] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 3, complete sequence
MQILFTTVQAIVRSLPSIISNRCFNNALGSPDSIIVDSPRSSIKWKSEYMKQTSKCVHIAANLFVSIFGSSPYSFIAFSASSTIIGLKHCRTKSSILDQYLLL
>NC_007357.1_1 [28 - 2304] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 1, complete sequence
MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVT
WWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELQDC
KIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMV
DILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAI
IVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSAEMSLRGVRVSKMGVDEYSSTERVV
VSIDRFLRVRDQQGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQDPTMLYNKMEFESFQSLVPKAARSQYSGF
VRTLFQQMRDVLGTFDTVQIIKLLPFAAAPPEPSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKATKRLTVLGKDAGALTEDPDEGTAGVESAVLRGF
LILGREDKRYGPALSINELSNLAKGEKANVLIMQGDVVLVMKRKRDFSILTDSQTATKRIRMAIN
>NC_007357.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLKSRFRFITNTTSPCIINTLAFSPFARLLSSLMLNAGPYLLSSLPRIRNPLNTADSTPAVPSSGSSVNAPASFPRTVSLLVALL
>NC_007364.1_1 [15 - 704] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 8, complete sequence
MDSNTITSFQVDCYLWHIRKLLSMRDMCDAPFDDRLRRDQKALKGRGSTLGLDLRVATMEGKKIVEDILKSETNENLKIAIASSPAPRYITDMSIEEMSRE
WYMLMPRQKITGGLMVKMDQAIMDKRIILKANFSVLFDQLETLVSLRAFTESGAIVAEIFPIPSVPGHFTEDVKNAIGI
LIGGLEWNDNSIRASENIQRFAWGIHDENGGPSLPPKQKRYMAKRVESEV
>NC_007364.1_2 [481 - 849] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 8, complete sequence
MWLKYFPFPPYQDILQRMSKMQLESSSVDLNGMITQFERLKIYRDSLGESMMRMGDLHSLQNRNATWRNELSQKFEEIRWLIAECRNILTKTENSFEQIT
FLQALQLLLEVESEIRTFSFQLI
>NC_007364.1_3 [382 - 56] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 8, complete sequence
MAWSIFTIRPPVIFCLGISMYHSRLISSMLISVIYRGAGLEAMAILRFSFVSLFRMSSTIFFPSIVATLKSSPSVLPLPFNAFWSLRSLSSKGASHMSLILSSFLM
CHR
>NC_007361.1_1 [21 - 1427] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 6, complete sequence
MNPNQKIITIGSICMVVGIISLMLQIGNIISIWVSHSIQTGNQHQAEPCNQSIITYENNTWVNQTYVNISNTNFLTEKAVASVTLAGNSSLCPISGWAVHSKD
NGIRIGSKGDVFVIREPFISCSHLECRTFFLTQGALLNDKHSNGTVKDRSPHRTLMSCPVGEAPSPYNSRFESVAWSASACHDGTSWLTIGISGPDNGAVA
VLKYNGIITDTIKSWRNNILRTQESECACVNGSCFTVMTDGPSNGQASYKIFKMEKGKVVKSVELNAPNYHYEECSCYPDAGEITCVCRDNWHGSNRPW
VSFNQNLEYQIGYICSGVFGDNPRPNDGTGSCGPVSPNGAYGVKGFSFKYGNGVWIGRTKSTNSRSGFEMIWDPNGWTGTDSSFSVKQDIVAITDWSGY
SGSFVQHPELTGLDCIRPCFWVELIRGRPKESTIWTSGSSISFCGVNSDTVGWSWPDDAELPFTIDK
>NC_007361.1_2 [1426 - 959] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 6, complete sequence
MSMVNGNSASSGQDQPTVSLFTPQKDMLLPLVQIVLSLGRPLISSTQKQGLMQSNPVSSGCWTKLPLYPDQSVIATISCFTEKLLSVPVHPFGSQIISKPLLE
LVLLVLPIQTPLPYLNENPFTPYAPLGDTGPQLPVPSLGRGLSPKTPLHIYPI
>NC_007362.1_1 [22 - 1725] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 4, complete sequence
MEKIVLLLAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLNGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYIVEKAS
PANDLCYPGDFNDYEELKHLLSRTNHFEKIQIIPKSSWSNHDASSGVSSACPYHGRSSFFRNVVWLIKKNSAYPTIKRSYNNTNQEDLLVLWGIHHPNDA
AEQTKLYQNPTTYISVGTSTLNQRLVPEIATRPKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGDSAIMKSELEYGNCNTKCQTPMGA
INSSMPFHNIHPLTIGECPKYVKSNRLVLATGLRNTPQRERRRKKRGLFGAIAGFIEGGWQGMVDGWYGYHHSNEQGSGYAADKESTQKAIDGVTNKV
NSIIDKMNTQFEAVGREFNNLERRIENLNKQMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRLQLRDNAKELGNGCFEFYHKCDNECM
ESVKNGTYDYPQYSEEARLNREEISGVKLESMGTYQILSIYSTVASSLALAIMVAGLSLWMCSNGSLQCRICI
>NC_007360.1_1 [28 - 1539] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 5, complete sequence
MSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYIQMCTELKLSDYEGRLIQNSITIERMVLSAFDERRNKYLEEHPSAGKDPKKTGG
PIYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELIRMIKRGINDRNFWRGENGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGY
DFERE
GYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVAPRGQLSTRGVQIASNENMETMDSSTLELRSRYWAIRTRSGG
NTNQQRASAGQISVQPTFSVQRNLPFERATIMAAFTGNTEGRTSDMRTEIIRMMESSRPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEE
YDN
>NC_007360.1_2 [1487 - 1137] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 5, complete sequence
MSKEGTIGFVAFSSESSKTPRPWKDTSSGLELSIILMISVLMSDVLPSVFPVNAAIMVALSKGRFLCTEKVGCTLICPADALCWLVFPPLLVLIAQYLLLSSR
VLESIVSMFSFEAI
>NC_007360.1_3 [465 - 28] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 5, complete sequence
MPDHHVSETSSCIFSIVRLTPNSPDLLFVIQNQLSHPFSVSSSVDWTSSFLWVLPRTGMFFQVFVPPFIKCRENHSLYCYAVLNQPSFIVAEFEFSAHLYIKPP
NSTNHSSNRCSDLSSILAFSTSFHLFIRSFGALRRHDVDVTQ
>NC_007358.1_1 [25 - 2295] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLKRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKGEMEIITHFQRKRRVRDNMTK
KMVTQRTIGKKKQRLNKRSYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQD
TELSFTITGDNTKWNENQNPRMFLAMITYITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLI
DGTASLSPGMMMGMFNMLSTVLGVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIEAGVDRFYRTCKLVGINMTKKKSYINRTGTCEFTSF
FYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMMDNDLGPATAQMALQLFIKDYRYPYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDG
GPNPYNIRNLHIPEAGLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCC
NLFEKFFPSSSYRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFAEIMKICSTIEELGRQK
>NC_007358.1_2 [1373 - 963] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 2, complete sequence
MRAKSSEDWSPSHQYVVLVYLFCPRFRIETPKTVLSILNMPIIIPGLNEAVPSISRGLIF
SIFFLVDSLKYFKSMLASISAGICVRSFMLLLSNMYPFPNLAILFENIIGAMLKTFLNHS
GWFLVMYVIIARNIRGF
>NC_007363.1_1 [26 - 781] Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 7, complete sequence
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLY
KKLKREITFHGAKEVALSYSTGALASCMGLIYNRMGTVTTEVAFGLVCATCEQIADSQHRSHRQMATTTNPLIRHENRMVLASTTAKAMEQMAGSSEQ
AAEAMEVASQARQMVQAMRTIGTHPSSSAGLKDNLLENLQAYQKRMGVQMQRFK
>NC_007363.1_2 [865 - 542] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 7, complete sequence

66
MHLKKRRSRIHNIKCSIPMILAATTRGSLESLHLHSHSFLVGLQIFKKIIFQTGTGARMSPNCPHCLHHLPSLTSNLHGFRCLLTRSSHLLHSLSCSAGQHHS
VLMPD
>NC_007363.1_3 [378 - 1] (REVERSE SENSE) Influenza A virus (A/Goose/Guangdong/1/96(H5N1)) segment 7, complete sequence
MSATSLAPWNVISLFSFLYSLTALSILFGSPFPFKAFWTKRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDGT
IERTYVSTSVRRLIFQYLPAFA
>NC_004907.1_1 [33 - 788] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 7, complete sequence
MSLLTEVETYVLSIIPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLY
KKLKREMTFHGAKEVALSYSTGALASCMGLIYNRMGTVTTEVALGLVCATCEQIADAQHRSHRQMATTTNPLIRHENRMVLASTTAKAMEQMAGSSE
QAAEAMEVASQARQMVQAMRTIGTHPSSSAGLKDDLIENLQAYQKRMGVQMQRFK
>NC_004907.1_2 [385 - 2] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 7, complete sequence
MSATSFAPWNVISLFSFLYSLTALSMLFGSPFPFRAFWTNRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDG
MIERTYVSTSVRRLIFQYLPAFGIP
>NC_004912.1_1 [21 - 2168] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKTMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESIIVESGDPNALLKHRFEIIEGRDRAMAWTVVNSICNTTGVD
KPKFLPDLYDYKENRFTEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETIEER
FEITGTMRRLADQSLPPNFSSLENFRAYVDGFKPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYDA
IKCMKTFFGWREPNIIKPHEKGINPNYLLAWKQVLAELQDIENEDKIPKTKNMKKTSQLMWALGENMAPEKLDFEDCKDIGDLKQYQSDEPELRSIASWI
QSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRS
HLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEVGEMLLRTAIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDM
TKEFFENRSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYSSPQLEGFSAESRKLLLIVQALRDNLEPGTFDLEGLYGAIEECLINDPWVLLNASWFNS
FLTHALK
>NC_004912.1_2 [1735 - 1412] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MEVPLVLTYRNMGLDTWPIAVRKSISPTSRTQYFSHLCGSSLGSVRENSILTKFTTSVSFLKWDLPFIMNPYRFVFLLPSFVLHLLIIGISWKSSMAAQDALS
KAVFM
>NC_004912.1_3 [1688 - 1119] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MAYCSPQEHFPYFKNTVLFPFVWLQPWVSKGEFHTHKVHHVSIISQMGPSFYNEPIQVCLPSSFFCSAFAYHWNQLEVIHGCTRCIEQSCIYVNPLHYVLS
GPAMRHFRCEVVPSHACNVLNWGNIFPYLIEFYPARIGQFTCLVELTLDPACYRSELWLITLILFQIANIFAVLKVQFFRCHILPECPH
>NC_004912.1_4 [331 - 2] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 3, complete sequence
MGLSTPVVLQMLFTTVQAIARSLPSIISNLCFNNAFGSPDSTIIDSPRSSMKWKSEYMKQTSKCVHIAANLFVSIFGSSPYSFIVFSASSTIIGLKHCRTKSSIL
DQYLL
>NC_004906.1_1 [27 - 716] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 8, complete sequence
MDSNTVSSFQVDCFLWHVRKRFADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIRTATREGKHIVERILEEESDEALKMTIASVPASRYLTEMTLEEMSR
DWLMLIPKQKVTGPLCIRMDQAVMGKTIILKANFSVIFNRLEALILLRAFTDEGAIVGEISPLPSLPGHTDEDVKNAIGVLIGGLEWNDNTVRVSETLQRFT
WRSSDENGRSPLPPKQKRKVERTIEPEV
>NC_004906.1_2 [535 - 861] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 8, complete sequence
MTRMSKMQLGSSSEDLNGMITQFESLKLYRDSLGEAVMRMGDLHSLQNRNGKWREQLSQKFEEIRWLIEEMRHRLRITENSFEQITFMQALQLLLEVEQ
EIRTFSFQLI
>NC_004906.1_3 [643 - 293] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 8, complete sequence
MLLQVNLCRVSETRTVLSFHSSPPMRTPIAFLTSSSVCPGREGNGEISPTIAPSSVNALSSIRASSRLKITLKFAFNMMVLPITAWSILMQRGPVTFCLGMSIN
QSLDISSRVISVR
>NC_004905.1_1 [28 - 1539] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMCTELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTG
GPIYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAIKGVGTMV
MELIRMIKRGINDRNFWRGDNGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGY
DFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLELRSRYWAIRT
RSGGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGD
NAEEYDN
>NC_004905.1_2 [1047 - 625] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MACHPYQLTFMCWILIWSNKTEDLTVLKQTERIYPNQRVPFPLKIISTGHSEPVHTSRQAGLMGYGSSQDECRPCQKDEIFNFSIPRISAFSHLIHHCSLCCC
LKFPFEDVAHSLICNPCSSSIIASPEVPVINASLYHPN

>NC_004908.1_1 [32 - 1711] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 4, complete sequence
METISLITILLVVTASNADKICIGHQSTNSTETVDTLTETNVPVTHAKELLHTEHNGMLCATSLGHPLILDTCTIEGLVYGNPSCDLLLGGREWSYIVERSSA
VNGTCYPGNVENLEELRTLFSSASSYQRIQIFPDTTWNVTYTGTSRACSGSFYRSMRWLTQKSGFYPVQDAQYTNNRGKSILFVWGIHHPPTYTEQTNLY
IRNDTTTSVTTEDLNRTFKPVIGPRPLVNGLQGRIDYYWSVLKPGQTLRVRSNGNLIAPWYGHVLSGGSHGRILKTDLKGGNCVVQCQTEKGGLNSTLPF
HNISKYAFGTCPKYVRVNSLKLAVGLRNVPARSSRGLFGAIAGFIEGGWPGLVAGWYGFQHSNDQGVGMAADRDSTQKAIDKITSKVNNIVDKMNKQ
YEIIDHEFSEVETRLNMINNKIDDQIQDVWAYNAELLVLLENQKTLDEHDANVNNLYNKVKRALGSNAMEDGKGCFELYHKCDDQCMETIRNGTYNRR
KYREESRLERQKIEGVKLESEGTYKILTIYSTVASSLVLAMGFAAFLFWAMSNGSCRCNICI*
>NC_004909.1_1 [1 - 1401] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 6, complete sequence
MNPNQKIIALGSVSITIATICLLMQIAILATTMTLHFNECTNPSNNQAVPCEPIIIERNITEIVHLNNTTIEKESCPKVAEYKNWSKPQCQITGFAPFSKDNSIRL
SAGGDIWVTREPYVSCGLGKCYQFALGQGTTLNNKHSNGTIHDRSPHRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWLHVCVTGDDRNATASII
YDGMLTDSIGSWSKNILRTQESECVCINGTCTVVMTDGSASGRADTKILFIREGKIVHIGPLSGSAQHVEECSCYPRYPEVRCVCRDNWKGSNRPVLYINV
ADYSVDSSYVCSGLVGDTPRNDDSSSSSNCRDPNNERGGPGVKGWAFDNGNDVWMGRTIKKDSRSGYETFRVVGGWTTANSKSQINRQVIVDSDNWS
GYSGIFSVEGKTCINRCFYVELIRGRPQETRVWWTSNSIIVFCGTSGTYGTGSWPDGANINFMSI
>NC_004910.1_1 [28 - 2304] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVT
WWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDMNPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKREELKN
CNIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGVRM
VDILKQNPTEEQAVDICKAAMGLKISSSFSFGGFTFKRTKGSSVKREEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRRMIQLIVSGRDEQSIAE
AIIVAMVFSQEDCMVKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSTEMSLRGVRVSKMGVDEYSSTER
VVVSIDRFLRVRDQRGNVLLSPEEVSETQGMEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQEPTMLYNKMEFEPFQSLVPKAARSQYS
GFVRTLFQQMRDVLGTFDTVQIIKLLPFAAAPPEQSRMQFSSLTVNVRGSGMRILVRGNSPAFNYNKTTKRLTILGKDAGALTEDPDEGTAGVESAVLRG
FLILGKEDKRYGPALSINELSNLTKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_004910.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPINTLAFSPFVRLLSSLMLNAGPYLLSSLPRIRNPLNTADSTPAVPSSGSSVSAPASFPSIVSLLVVLL
>NC_004910.1_3 [1295 - 735] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 1, complete sequence
MFTKFKSPRTAFTMQSSCENTIATIIASAIDCSSLPLTISWIILLVAFLRMAVALRPTIVNSSYPSCTFIFNVWRLPVSTSSSLLTEDPFVLLKVNPPKLKDELIF
KPIAALHISTACSSVGFCLRMSTILTPPICVLWHISRSEANGSADTVALLTMFLAAMIKLWSTSSFLTSPPGVYICSQQVP
>NC_004905.2_1 [22 - 1533] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMCTELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTG
GPIYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAIKGVGTMV
MELIRMIKRGINDRNFWRGDNGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGY
DFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLELRSRYWAIRT

67
RSGGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGD
NAEEYDN
>NC_004905.2_2 [1041 - 619] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 5, complete sequence
MACHPYQLTFMCWILIWSNKTEDLTVLKQTERIYPNQRVPFPLKIISTGHSEPVHTSRQAGLMGYGSSQDECRPCQKDEIFNFSIPRISAFSHLIHHCSLCCC
LKFPFEDVAHSLICNPCSSSIIASPEVPVINASLYHPN
>NC_004911.1_1 [24 - 2297] Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGRWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGLFENSCLETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTK
KMVTQRTIGKKKQKLTKKSYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVHFVEALARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQD
TELSFTVTGDNTKWNENQNPRIFLAMITYITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLANIDLKYFNESTRKKIEKIRPLLI
EGTASLSPGMMMGMFNMLSTVLGVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINRTGTFEFTSFF
YRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGG
PNLYNIRNLHIPEVCLKWELMDEDYQGRLCNPLNPFVSHKEVESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCC
TLFEKFFPSSSYRRPVGISSMMEAMVSRARIDARIDFESGRIKKEEFAEILKICSTIEELGRQGK
>NC_004911.1_2 [2272 - 1925] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 2, complete sequence
MVEQIFKISANSSFLILPDSKSIRASIRALDTMASIMLEIPTGLRYELLGKNFSNRVQHFWYICSSSRIPLWLVLRMERFLLGIHECVVATASYSMLLAGPWA
GITTALLTDSTSL
>NC_004911.1_3 [1372 - 962] (REVERSE SENSE) Influenza A virus (A/Hong Kong/1073/99(H9N2)) segment 2, complete sequence
MRAKSSEDWSPSHQYVVLVYLFCPRFKIETPKTVLSILNMPIIIPGLNEAVPSISRGLIFSIFFLVDSLKYFKSMFASISAGICVRSFMLLLSNMYPFPNLAILF
ENIIGAMLKTFLNHSGWFLVMYVIIARNIRGF
>NC_007374.1_1 [44 - 1729] Influenza A virus (A/Korea/426/68(H2N2)) segment 4, complete sequence
MAIIYLILLFTAVRGDQICIGYHANNSTEKVDTILERNVTVTHAKDILEKTHNGKLCKLNGIPPLELGDCSIAGWLLGNPECDRLLSVPEWSYIMEKENPRY
SLCYPGSFNDYEELKHLLSSVKHFEKVKILPKDRWTQHTTTGGSWACAVSGKPSFFRNMVWLTRKGSNYPVAKGSYNNTSGEQMLIIWGVHHPNDEAE
QRALYQNVGTYVSVATSTLYKRSIPEIAARPKVNGLGRRMEFSWTLLDMWDTINFESTGNLVAPEYGFKISKRGSSGIMKTEGTLENCETKCQTPLGAIN
TTLPFHNVHPLTIGECPKYVKSEKLVLATGLRNVPQIESRGLFGAIAGFIEGGWQGMVDGWYGYHHSNDQGSGYAADKESTQKAFNGITNKVNSVIEKM
NTQFEAVGKEFSNLEKRLENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRMQLRDNVKELGNGCFEFYHKCDNECMDSVKNG
TYDYPKYEEESKLNRNEIKGVKLSSMGVYQILAIYATVAGSLSLAIMMAGISFWMCSNGSLQCRICI
>NC_007376.1_1 [25 - 2172] Influenza A virus (A/Korea/426/68(H2N2)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIMVELDDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGA
EKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMANRGLWDSFRQSERGEETIEE
RFEITGTMRRLADQSLPPNFSCLENFRAYVDGFEPNGYIEGKLSQMSKEVNAKIEPFLKTTPRPIRLPDGPPCFQRSKFLLMDALKLSIEDPSHEGEGIPLYD
AIKCMRTFFGWKEPYIVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVDFDNCRDISDLKQYDSDEPELRSLSS
WIQNEFNKACELTDSIWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKG
RSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQMSRPMFLYVRTNGTSKIKMKWGMEMRPCLLQSLQQIESMVEAESSVKEK
DMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASW
FNSFLTHALR
>NC_007376.1_2 [858 - 550] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 3, complete sequence
MKTRRPIRKSNWSWCCFQKRFNFCIYFFGHLRKLALNVAVRFESIHIGSKILKAGEVRRETLVGKPAHCPCDFKSFFNCFFASFGLTKGIPEASVGHFLSYG
E
>NC_007377.1_1 [26 - 781] Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLY
RKLKREITFHGAKEVALSYSAGALASCMGLIYNRMGAVTTEVAFAVVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAMEQMAGSSEQ
AAEAMEVASQARQMVQAMRAIGTPPSSSAGLKDDLLENLQAYQKRMGVQMQRFK
>NC_007377.1_2 [517 - 185] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence
MPVRPMLGVSNLFTGCTYHGKGHFSGHSPHPVVYEAHATGKCTSRITERYFFGPMECYLP
LKLSIQFNCSVHVIWIPIPIEGILDKASTLQSSLTWHGEREYKSQNPLSQR
>NC_007377.1_3 [378 - 1] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 7, complete sequence
MSATSLAPWNVISLLSFLYSLTALSMLFGSPFPLRAFWTKRLRCSPRSLGTVSVNTNPKI
PLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDGTIERTYVSTSVRRLIFQ
YLPAFA
>NC_007382.1_1 [1 - 1407] Influenza A virus (A/Korea/426/68(H2N2)) segment 6, complete sequence
MNPNQKIITIGSVSLTIATVCFLMQIAILVTTVTLHFKQHECDSPASNQVMPCEPIIIER
NITEIVYLNNTTIEKEICPEVVEYRNWSKPQCQITGFAPFSKDNSIRLSAGGDIWVTREP
YVSCDPGKCYQFALGQGTTLDNKHSNDTIHDRIPHRTLLMNELGVPFHLGTRQVCVAWSS
SSCHDGKAWLHVCVTGDDKNATASFIYDGRLMDSIGSWSQNILRTQESECVCINGTCTVV
MTDGSASGRADTRILFIEEGKIVHISPLSGSAQHVEECSCYPRYPDVRCICRDNWKGSNR
PVIDINMEDYSIDSSYVCSGLVGDTPRNDDRSSNSNCRNPNNERGNPGVKGWAFDNGDDV
WMGRTISKDLRSGYETFKVIGGWSTPNSKSQINRQVIVDSNNWSGYSGIFSVEGKRCINR
CFYVELIRGRQQETRVWWTSNSIVVFCGTSGTYGTGSWPDGANINFMPI*
>NC_007381.1_1 [1 - 1494] Influenza A virus (A/Korea/426/68(H2N2)) segment 5, complete sequence
MASQGTKRSYEQMETDGERQNATEIRASVGKMIDGIGRFYIQMCTELKLSDYEGRLIQNS
LTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYKRVDGKWMRELVLYDKEEIRRIW
RQANNGDDATAGLTHMMIWHSNLNDTTYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAG
AAVKGVGTMVMELIRMIKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMMD
QVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGPAIASGYNFEKEGYSLVG
IDPFKLLQNSQVYSLIRPNENPAHKSQLVWMACNSAAFEDLRVLSFIRGTKVSPRGKLST
RGVQIASNENMDTMESSTLELRSRYWAIRTRSGGNTNQQRASAGQISVQPAFSVQRNLPF
DKPTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEMSFQGRGVFELSDEKATNPIVPSFD
MSNEGSYFFGDNAEEYDN*
>NC_007381.1_2 [723 - 184] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 5, complete sequence
MIHHCSLCSCLKFSFENVAHSLVSTPCFPSILTSPEVPIIDPTFDHPDQLHHHCPNSFDC
SACSSRPPRESRTLHQRAHPGIHSGANKSSCPLVCCIIQIGMPDHHVSQPSCCIITIIGL
APDSPYFFFVIKDEFPHPLSIYSLVYGSSSFLRILPRAGMFFQIFIPSLVKSREHHSLYC
>NC_007375.1_1 [25 - 2295] Influenza A virus (A/Korea/426/68(H2N2)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTE
TGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEVIQQTRVD
KLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVIESMDKEEME
ITTHFQRKRRVRDNMTKKMVTQRTIGKKKQRLNKRSYLIRALTLNTMTKDAERGKLKRRA
IATPGMQIRGFVHFVETLARNICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSF
TITGDNTKWNENQNPRVFLAMITYITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESK
SMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTVSLSPGMMMGMFNMLSTVLG
VSILNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVNRFYRTCKLVGINMSKK
KSYINRTGTFEFTSFFYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLG
PATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGSNLYN
IRNLHIPEVCLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAV

68
ATTHSWTPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSSYRRPVGISSMVEAMVS
RARIDARIDFESGRIKKEEFAEIMKICSTIEELRRQK
>NC_007380.1_1 [1 - 711] Influenza A virus (A/Korea/426/68(H2N2)) segment 8, complete sequence
MDSNTVSSFQVDCFLWHVRKQVVDQELGDAPFLDRLRRDQKSLRGRGSTLDLDIEAATRV
GKQIVERILKEESDEALKMTMASAPASRYLTDMTIEELSRDWFMLMPKQKVEGPLCIRID
QAIMDKNIMLKANFSVIFDRLETLILLRAFTEEGAIVGEISPLPSLPGHTIEDVKNAIGV
LIGGLEWNDNTVRVSKTLQRFAWRSSNENGRPPLTPKQKRKMARTIRSKVRRDKMAD
>NC_007380.1_2 [467 - 835] Influenza A virus (A/Korea/426/68(H2N2)) segment 8, complete sequence
MLAKFHHCLLFQDILLRMSKMQLGSSSEDLNGMITQFESLKLYRDSLGEAVMRMGDLHSL
QNRNGKWREQLGQKFEEIRWLIEEVRHRLKITENSFEQITFMQALQLLFEVEQEIRTFSF
QLI*
>NC_007380.1_3 [767 - 120] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 8, complete sequence
MLFAQNYSLLSSICVSLLQSAILSLRTFDLIVLAIFRFCFGVSGGLPFSLLLLQANLCRV
LETRTVLSFHSSPPMRTPIAFLTSSIVCPGREGNGEISPTIAPSSVKALSNIRVSSRSKI
TLKFAFNMMFLSMIAWSILMQRGPSTFCLGISMNQSLDNSSIVMSVRYREAGAEAMVILS
ASSDSSFRILSTICFPTRVAASMSRSRVLPLPLRDF
>NC_007378.1_1 [28 - 2304] Influenza A virus (A/Korea/426/68(H2N2)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITAD
KRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLAVTWWNRNGPMTSTVHYPKIYKTYFE
KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE
SQLTITKEKKEELQDCKISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW
EQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCHSTQIGGTRMVDILRQ
NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSIKREEEVLTGNLQTLKIRVHEGY
EEFTMVGKRATAILRKATRRLVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF
VNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDNVMGMIGVLPDMTPSTEMSMRGIRV
SKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEIN
GPESVLVNTYQWIIRNWETVKIQWSQNPTMLYNKMEFEPFQSLVPKAIRGQYSGFVRTLF
QQMRDVLGTFDTTQIIKLLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNK
TTKRLTILGKDAGTLTEDPDEGTSGVESAVLRGFLILGKEDRRYGPALSINELSTLAKGE
KANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_007378.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPISTLAFSPFARVLSSLMLNAGPYLLSS
LPRMRNPLRTADSTPDVPSSGSSVKVPASFPRIVSLLVVLL
>NC_007378.1_3 [1163 - 798] (REVERSE SENSE) Influenza A virus (A/Korea/426/68(H2N2)) segment 1, complete sequence
MVAFLSIAVALFPTIVNSSYPSCTLIFNVWRLPVSTSSSLLIDDPLVLLNVNPPKLKDEL
ILSPIAALHISTACSSVGFCLRMSTILVPPICVLWHISNKDASGSADTAALLTMFLAAII
RL
>NC_007373.1_1 [28 - 2304] Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLAVT
WWNRNGPVASTVHYPKVYKTYFDKVERLKHGTFGPVHFRNQVKIRRRVDINPHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELRDCK
ISPLMVAYMLERELVRKTRFLPVAGGTSSIYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCHSTQIGGTRMVDI
LRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSVKKEEEVLTGNLQTLKIRVHEGYEEFTMVGKRATAILRKATRRLVQLIVSGRDEQSIAEAII
VAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDSVMGMVGVLPDMTPSTEMSMRGIRVSKMGVDEYSSTERVV
VSIDRFLRVRDQRGNVLLSPEEVSETQGTERLTITYSSSMMWEINGPESVLVNTYQWIIRNWEAVKIQWSQNPAMLYNKMEFEPFQSLVPKAIRSQYSGF
VRTLFQQMRDVLGTFDTTQIILLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRLTILGKDAGTLIEDPDESTSGVESAVLRGFLIIG
KEDRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_007373.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPISTLAFSPFARLLSSLMLNAGPYLLSSLPIMRNPLKTADSTPDVLSSGSSIKVPASFPRIVSLLVVLL
>NC_007373.1_3 [1670 - 1368] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MTKTDSGPLISHIIDDEYVIVNLSVPCVSLTSSGDNNTFPRWSRTLKNRSMLTTTLSVLEYSSTPILLTLIPLIDISVLGVISGNTPTIPITLSMCSIPQF
>NC_007373.1_4 [1163 - 735] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 1, complete sequence
MVAFLSIAVALFPTIVNSSYPSCTLIFNVWRLPVSTSSSFLTDDPLVLLNVNPPKLKDELILNPIAALHISTACSSVGFCLRMSTILVPPICVLWHISNKDASGS
ADTAALLTMFLAAIIRLWSTSSFLTSPPGVYICSQHVP
>NC_007369.1_1 [46 - 1539] Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYRRV
DGKWMRELVLYDKEEIRRIWRQANNGEDATAGLTHIMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMVMELIRM
VKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACAYGPAVSSGYDFEKEG
YSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGTKVSPRGKLSTRGVQIASNENMDNMGSSTLELRSGYWAIRTRSGGNT
NQQRASAGQTSVQPTFSVQRNLPFEKSTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEVSFRGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEEY
DN
>NC_007369.1_2 [768 - 445] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MIHHCSLCSCLKFSFKNVAHSLISTSCFPPILTSPEISIVDPPFDHSDQFHHHCPDSFDCSTCSSGPSRESRALHQRAHSGIHSSSNKSSCPLVCCIIQIGMPDHY
VS
>NC_007369.1_3 [499 - 194] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MSSGMLHHSNWNARSLCELDQLSHPHHCWLGARFALSLLCHKGRVPSSIFHLLSCIWAPQFSWDLSPRWGVLPGIYSFFHQKQRAPFSLLSSCSGSTALH
DH
>NC_007369.1_4 [411 - 28] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 5, complete sequence
MAPDSPYLFFVIKDEFPHPFSIYSPVYGPPSFLGIFPRAGVFFQVFIPSFIKSREHHFLYCQAVLDQPPFMITEFKFSAHLDVESPNSINHLPDGCPNLSCILAIPI
SFHLFIRPFGALGRHDFDVTR
>NC_007367.1_1 [26 - 781] Influenza A virus (A/New York/392/2004(H3N2)) segment 7, complete sequence
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLY
RKLKREITFHGAKEIALSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVATTNPLIKHENRMVLASTTAKAMEQMAGSSEQ
AAEAMEIASQARQMVQAMRAVGTHPSSSTGLRDDLLENLQTYQKRMGVQMQRFK
>NC_007367.1_2 [517 - 185] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 7, complete sequence
MPMRPVLGVSNLFTCCTYQAKCHFSGYSPHPIVYEAHATGKCTSRITESYFFGPMERYLPLKFPIQFNCFVHVIWISIPIEGILDKASTLQSSLTGHGEREHK
PQNPLSQR
>NC_007367.1_3 [378 - 1] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 7, complete sequence
MRAISLAPWNVISLLSFLYSLTALSMLFGSPFPLRAFWTKRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRASRSVFFPAKTSSSLCAISALRGPDG
TIERTYVSTSVRRLIFQYLPAFA
>NC_007371.1_1 [25 - 2172] Influenza A virus (A/New York/392/2004(H3N2)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIVVELDDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGA
EKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEIATKADYTLDEESRARIKTRLFTIRQEMANRGLWDSFRQSERGEETIEEK
FEISGTMRRLADQSLPPKFSCLENFRAYVDGFEPNGCIEGKLSQMSKEVNAKIEPFLKTTPRPIKLPNGPPCYQRSKFLLMDALKLSIEDPSHEGEGIPLYDA
IKCIKTFFGWKEPYIVKPHEKGINSNYLLSWKQVLSELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVDFDNCRDISDLKQYDSDEPELRSLSSWIQ
NEFNKACELTDSIWIELDEIGEDVAPIEYIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSH

69
LRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQISRPMFLYVRTNGTSKVKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDMTK
EFFENKSEAWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSF
LTHALK
>NC_007368.1_1 [20 - 1426] Influenza A virus (A/New York/392/2004(H3N2)) segment 6, complete sequence
MNPNQKIITIGSVSLTISTICFFMQIAILITTVTLHFKQYEFNSPPNNQVMLCEPTIIERNITEIVYLTNTTIEKEMCPKLAEYRNWSKPQCDITGFAPFSKDNSI
RLSAGGDIWVTREPYVSCDPDKCYQFALGQGTTLNNVHSNDTVHDRTPYRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWLHVCVTGDDKNATA
SFIYNGRLVDSIVSWSKKILRTQESECVCINGTCTVVMTDGSASGKADTKILFIEEGKIIHTSTLSGSAQHVEECSCYPRYPGVRCVCRDNWKGSNRPIVDI
NIKDYSIVSSYVCSGLVGDTPRKNDSSSSSHCLDPNNEEGGHGVKGWAFDDGNDVWMGRTISEKLRSGYETFKVIEGWSKPNSKLQINRQVIVDRGNRS
GYSGIFSVEGKSCINRCFYVELIRGRKEETEVLWTSNSIVVFCGTSGTYGTGSWPDGADINLMPI
>NC_007368.1_2 [360 - 34] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 6, complete sequence
MSPPAESLIELSLEKGANPVMSHCGFDQFLYSASLGHISFSMVVLVRYTISVMFLSIIVGSHSITWLFGGELNSYCLKCNVTVVIRMAICMKKHIVEMVRET
EPIVIIF
>NC_007370.1_1 [27 - 716] Influenza A virus (A/New York/392/2004(H3N2)) segment 8, complete sequence
MDSNTVSSFQVDCFLWHIRKQVVDQELSDAPFLDRLRRDQRSLRGRGNTLGLDIKAATHVGKQIVEKILKEESDEALKMTMVSTPASRYITDMTIEELSR
NWFMLMPKQKVEGPLCIRMDQAIMEKNIMLKANFSVIFDRLETIVLLRAFTEEGAIVGEISPLPSFPGHTIEDVKNAIGVLIGGLEWNDNTVRVSKNLQRF
AWRSSNENGGPPLTPKQKRKMARTARSKV
>NC_007370.1_2 [493 - 861] Influenza A virus (A/New York/392/2004(H3N2)) segment 8, complete sequence
MLAKSHHCLLFQDILLRMSKMQLGSSSEDLNGMITQFESLKIYRDSLGEAVMRMGDLHLLQNRNGKWREQLGQKFEEIRWLIEEVRHRLKTTENSFEQI
TFMQALQLLFEVEQEIRTFSFQLI
>NC_007370.1_3 [793 - 146] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 8, complete sequence
MLFVQSYFQLFLVCVSLLQSAILSLQTFDLAVLAIFRFCFGVSGGPPFSLLLLQANLCRFLETRTVLSFHSSPPMRTPIAFLTSSIVCPGKEGNGEISPTIAPSS
VKALSNTMVSSRSKITLKFAFNMMFFSMIAWSILMQRGPSTFCLGISMNQFLDNSSIVMSVMYREAGVETMVILSASSDSSFRIFSTICFPTWVAALMSRP
RVLPLPLRDL
>NC_007372.1_1 [25 - 2295] Influenza A virus (A/New York/392/2004(H3N2)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTK
KMVTQRTIGKKKQRVNKRGYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQ
DTELSF
TITGDNTKWNENQNPRMFLAMITYITKNQPEWFRNILSIAPIMFSNKMARLGKGYMFESKRMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTAS
LSPGMMMGMFNMLSTVLGVSVLNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINKTGTFEFTSFFYRY
GFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWDQTQSRAGLLVSDGGPNL
YN
IRNLHIPEVCLKWELMDENYRGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWNPKRNRSILNTSQRGILEDEQMYQKCCNLFEKF
FPSSSYRRPIGISSMVEAMVSRARIDARIDFESGRIKKEEFSEIMKICSTIEELRRQK
>NC_007372.1_2 [2285 - 1884] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 2, complete sequence
MSSSMVEQIFMISENSSFLIRPDSKSILASIRALDTMASTMLEIPIGLLYELLGKNFSNKLQHFWYICSSSRIPLWLVFRIERFLLGFQECVVATASYSILLAGP
WAGITTALFTDSISLWLTKGFRGLQSLPR
>NC_007372.1_3 [1373 - 1011] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 2, complete sequence
MRAKSSEDWSPSHQYVVLVYFFCPRFSTETPKTVLSMLNMPIIIPGLNDAVPSIRRGLIFSIFFLVDSLKYFRSMLASISAGICVRSFILLLSNMYPFPSLAILF
ENIIGAMLRMFLNHSG
>NC_007366.1_1 [30 - 1727] Influenza A virus (A/New York/392/2004(H3N2)) segment 4, complete sequence
MKTIIALSYILCLVFAQKLPGNDNSTATLCLGHHAVPNGTIVKTITNDQIEVTNATELVQSSSTGGICDSPHQILDGENCTLIDALLGDPQCDGFQNKKWDL
FVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACKRRSNNSFFSRLNWLTHLKFKYPALNVTMPNNEKFDKLYIWGVHH
PGTDNDQISLYAQASGRITVSTKRSQQTVIPSIGSRPRIRDVPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIGKCNSECITPNGSIPND
KPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGTGQAADLKSTQAAINQINGKLNRLIGKTN
EKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFERTKKQLRENAEDMGNGCFKIYHKCDNACIGSIRNGTYDHD
VYRDEALNNRFQIKGVELKSGYKDWILWISFAISC
FLLCVALLGFIMWACQKGNIRCNICI
>NC_007366.1_2 [1259 - 888] (REVERSE SENSE) Influenza A virus (A/New York/392/2004(H3N2)) segment 4, complete sequence
MMEFLVCFPDQPIQLPIDLVDCCLSAFEICCLSCSLRILMPETVPTVYHSLPTIFYETRDCAKYASSLFLWYISHPCCQFQSVLLNISGTGPICDPVYILKWFV
IGNASIWSDAFRIAFANGCI
>NC_002018.1_1 [21 - 1382] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 6, complete sequence
MNPNQKIITIGSICLVVGLISLILQIGNIISIWISHSIQTGSQNHTGICNQNIITYKNSTWVKDTTSVILTGNSSLCPIRGWAIYSKDNSIRIGSKGDVFVIREPFIS
CSHLECRTFFLTQGALLNDRHSNGTVKDRSPYRALMSCPVGEAPSPYNSRFESVAWSASACHDGMGWLTIGISGPDNGAVAVLKYNGIITETIKSWRKKI
LRTQESECACVNGSCFTIMTDGPSDGLASYKI
FKIEKGKVTKSIELNAPNSHYEECSCYPDTGKVMCVCRDNWHGSNRPWVSFDQNLDYQIGYICSGVFGDNPRPKDGTGSCGPVYVDGANGVKGFSYRY
GNGVWIGRTKSHSSRHGFEMIWDPNGWTETDSKFSVRQDVVAMTDWSGYSGSFVQHPELTGLDCIRPCFWVELIRGRPKEKTIWTSASSISFCGVNSDT
VDWSWPDGAELPFTIDK
>NC_002023.1_1 [28 - 2304] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 1, complete sequence
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRITEMIPERNEQGQTLWSKMNDAGSDRVMVSPLAVT
WWNRNGPMTNTVHYPKIYKTYFERVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELQDC
KISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVKNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCHSTQIGGIRMV
DILKQ
NPTEEQAVGICKAAMGLRISSSFSFGGFTFKRTSGSSVKREEEVLTGNLQTLKIRVHEGYEEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMV
FSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGVEPIDNVMGMIGILPDMTPSIEMSMRGVRISKMGVDEYSSTERVVVSIDRF
LRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQNPTMLYNKMEFEPFQSLVPKAIRGQYSGFVRTLFQ
QMRDVLGTFDTAQIIKLLPFAAAPPKQSRMQFSSFTVNVRGSGMRILVRGNSPVFNYNKATKRLTVLGKDAGTLTEDPDEGTAGVESAVLRGFLILGKED
RRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>NC_002023.1_2 [2303 - 2001] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 1, complete sequence
MMAIRILLVAVWLSVSMLESRFRFITNTTSPCPISTLAFSPFARLLSSLMLNAGPYLLSSLPRMRNPLRTADSTPAVPSSGSSVKVPASFPRTVSLFVALL
>NC_002017.1_1 [33 - 1730] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 4, complete sequence
MKANLLVLLCALAAADADTICIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDSHNGKLCRLKGIAPLQLGKCNIAGWLLGNPECDPLLPVRSWSYIVET
PNSENGICYPGDFIDYEELREQLSSVSSFERFEIFPKESSWPNHNTTKGVTAACSHAGKSSFYRNLLWLTEKEGSYPKLKNSYVNKKGKEVLVLWGIHHPS
NSKDQQNIYQNENAYVSVVTSNYNRRFTPEIAERPKVRDQAGRMNYYWTLLKPGDTIIFEANGNLIAPRYAFALSRGFGSGIITSNASMHECNTKCQTPL
GAINSSLPFQNIHPVTIGECPKYVRSAKLRMVTGLRNIPSIQSRGLFGAIAGFIEGGWTGMIDGWYGYHHQNEQGSGYAADQKSTQNAINGITNKVNSVIE
KMNIQFTAVGKEFNKLEKRMENLNKKVDDGFLDIWTYNAELLVLLENERTLDFHDSNVKNLYEKVKSQLKNNAKEIGNGCFEFYHKCDNECMESVRN
GTYDYPKYSEESKLNREKVDGVKLESMGIYQILAIYSTVASSLVLLVSLGAISFWMCSNGSLQCRICI
>NC_002017.1_2 [1680 - 1276] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 4, complete sequence
MPPGRPKAPVNWRQLSRSPESDRSPLIPISLHLPFPCSTLTLLNIWDNHKSHFLHFPCIHCHTCGRTQNIHFRFLWHYSLIGFLLSHTDSSHLSHGNPESFHFP
VELTILHYMSKCPEIHHQLFYLNFPSFFLIC
>NC_002019.1_1 [28 - 1539] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 5, complete sequence
MSDIKIMASQGTKRSYEQMETDGERQNATEIRASVGKMIGGIGRFYIQMCTELKLSDYEGRLIQNSLTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGG
PIYRRVNGKWMRELILYDKEEIRRIWRQANNGDDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELVRMIKRGINDRNFWRGENGRKTRIAYERMCNILKGKFQTAAQKAMMDQVRESRDPGNAEFEDLTFLARSALILRGSVAHKSCLPACVYGPAVASG
YDFEREGYSLVGIDPFRLLQNSQVYSLIRPNENPAHKSQLVWMACHSAAFEDLRVLSFIKGTKVVPRGKLSTRGVQIASNENMETMESSTLELRSRYWAI

70
RTRSGGNTNQQRASAGQISIQPTFSVQRNLPFDRTTVMAAFTGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKAASPIVPSFDMSNEGSYFF
GDNAEEYDN
>NC_002019.1_2 [549 - 229] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 5, complete sequence
MHQRAHPGIHSGANKSPCPLISCIIQIGMPDHHVSQTSRCIVTIISLAPDSPYFFFVIKDEFSHPLSVYSSVYRSSSFLRILPRTGMFFQVFISPFVKSREHHSLY
C
>NC_002022.1_1 [25 - 2172] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 3, complete sequence
MEDFVRQCFNPMIVELAEKTMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIIVELGDPNALLKHRFEIIEGRDRTMAWTVVNSICNTTGAE
KPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDSFRQSERGEETIEER
FEITGTMRKLADQSLPPNFSSLENFRAYVDGFEPNGYIEGKLSQMSKEVNARIEPFLKTTPRPLRLPNGPPCSQRSKFLLMDALKLSIEDPSHEGEGIPLYDA
IKCMRTFFGWKEPNVVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPKTKNMKKTSQLKWALGENMAPEKVDFDDCKDVGDLKQYDSDEPELRSLAS
WIQNEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTSEVSHCRATEYIMKGVYINTALLNASCAAMDDFQLIPMISKCRTKEGRRKTNLYGFIIKG
RSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQVSRPMFLYVRTNGTSKIKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKD
MTKEFFENKSETWPIGESPKGVEESSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKLLLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFN
SFLTHALS
>NC_002022.1_2 [1706 - 1380] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 3, complete sequence
MGLETWPMALLRSISPISRTQYFSHLCGSSLGSVRENSMLTKFTTSVSFLKWDLPFMMKPYKLVFRLPSLVLHLLIIGINWKSSIAAQDALSKAVLMYTPFI
MYSVALQ
>NC_002022.1_3 [858 - 550] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 3, complete sequence
MRTGRPIRKSKWSWCCFQKRFNSSIYFFGHLRQLALNVAVRFESIHIGSKIFKAGEVRRETLVGKLAHCSCDFKPFFNCLFSSLGLTKGIPEASAGHFLSYG
E
>NC_002016.1_1 [26 - 781] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 7, complete sequence
MSLLTEVETYVLSIIPSGPLKAEIAQRLEDVFAGKNTDLEVLMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLY
RKLKREITFHGAKEISLSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAMEQMAGSSEQA
AEAMEVASQARQMVQAMRTIGTHPSSSAGLKNDLLENLQAYQKRMGVQMQRFK
>NC_002016.1_2 [378 - 1] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 7, complete sequence
MSEISLAPWNVISLLSFLYSLTALSMLFGSPFPLRAFWTKRLRCSPRSLGTVSVNTNPKIPLVRGDRIGLVFSHSMRTSRSVFFPAKTSSSLCAISALRGPDG
MIERTYVSTSVRRLIFQYLPAFA
>NC_002020.1_1 [27 - 716] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 8, complete sequence
MDPNTVSSFQVDCFLWHVRKRVADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIETATRAGKQIVERILKEESDEALKMTMASVPASRYLTDMTLEEMS
REWSMLIPKQKVAGPLCIRMDQAIMDKNIILKANFSVIFDRLETLILLRAFTEEGAIVGEISPLPSLPGHTAEDVKNAVGVLIGGLEWNDNTVRVSETLQRF
AWRSSNENGRPPLTPKQKREMAGTIRSEV
>NC_002020.1_2 [493 - 861] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 8, complete sequence
MLAKFHHCLLFQDILLRMSKMQLESSSEDLNGMITQFESLKLYRDSLGEAVMRMGDLHSLQNRNEKWREQLGQKFEEIRWLIEEVRHKLKVTENSFEQI
TFMQALHLLLEVEQEIRTFSFQLI
>NC_002020.1_3 [793 - 293] (REVERSE SENSE) Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 8, complete sequence
MLFAQNYSLLPSVCVSLLQSTILFLQTSDLIVPAISRFCFGVSGGLPFSLLLLQANLCRVSETRTVLSFHSSPPMRTPTAFLTSSAVCPGREGNGEISPTIAPSS
VKALSNIRVSSRSKITLKFAFSMMFLSMIAWSILIQRGPATFCLGMSMDHSLDISSRVMSVR
>NC_002021.1_1 [25 - 2295] Influenza A virus (A/Puerto Rico/8/34(H1N1)) segment 2, complete sequence
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKARWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESH
PGIFENSCIETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMKKEEMGITTHFQRKRRVRDNMTK
KMITQRTIGKRKQRLNKRSYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDT
ELSLTITGDNTKWNENQNPRMFLAMITYMTRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNDSTRKKIEKIRPLLI
EGTASLSPGMMMGMFNMLSTVLGVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLHGINMSKKKSYINRTGTFEFTSFF
YRYGFVANFSMELPSFGVSGSNESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFEIKKLWEQTRSKAGLLVSDGG
PNLYNIRNLHIPEVCLKWELMDEDYQGRLCNPLNPFVSHKEIESMNNAVMMPAHGPAKNMEYDAVATTHSWIPKRNRSILNTSQRGVLEDEQMYQRCC
NLFEKFFPSSSYRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFTEIMKICSTIEELRRQK

GENSCANW output for sequence

Predicted genes/exons:

Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------

1.01 Sngl + 27 719 693 2 0 73 49 328 0.996 23.61


1.02 PlyA + 869 874 6 -0.45

2.00 Prom + 928 967 40 -13.61


2.01 Sngl + 986 2482 1497 1 0 92 55 810 0.998 73.55
2.02 PlyA + 3149 3154 6 -3.44

3.00 Prom + 3955 3994 40 -6.66


3.01 Sngl + 4020 5516 1497 2 0 83 44 805 0.935 71.05
3.02 PlyA + 6328 6333 6 -3.24

4.00 Prom + 6396 6435 40 -12.11


4.01 Sngl + 6488 8767 2280 1 0 51 35 750 0.423 59.21
4.02 PlyA + 8781 8786 6 -5.12

5.00 Prom + 8819 8858 40 -13.78


5.01 Sngl + 8880 9638 759 2 0 65 46 323 0.920 21.54
5.02 PlyA + 9860 9865 6 -0.45

6.00 Prom + 9899 9938 40 -13.78


6.01 Sngl + 9959 12109 2151 1 0 59 33 826 0.988 68.37
6.02 PlyA + 12146 12151 6 -0.45

7.00 Prom + 12292 12331 40 -12.59


7.01 Sngl + 12362 14518 2157 1 0 89 40 940 0.565 83.44
7.02 PlyA + 14541 14546 6 -0.45

71
8.00 Prom + 14579 14618 40 -13.78
8.01 Sngl + 14658 16346 1689 2 0 80 38 557 0.990 45.02
8.02 PlyA + 16363 16368 6 -1.75

9.00 Prom + 16428 16467 40 -11.43


9.01 Sngl + 16470 18749 2280 2 0 41 35 702 0.651 53.41
9.02 PlyA + 18763 18768 6 -5.12

10.00 Prom + 18818 18857 40 -15.29


10.01 Sngl + 18863 21136 2274 1 0 8 39 973 0.267 77.66
10.02 PlyA + 21159 21164 6 -0.45

11.00 Prom + 21188 21227 40 -15.72


11.01 Sngl + 21259 23409 2151 0 0 59 33 992 0.971 84.97
11.02 PlyA + 23446 23451 6 -0.45

12.00 Prom + 23507 23546 40 -13.96


12.01 Sngl + 23549 24241 693 1 0 73 49 329 0.977 23.71
12.02 PlyA + 24391 24396 6 -0.45

13.00 Prom + 24451 24490 40 -11.72


13.01 Sngl + 24513 26009 1497 2 0 84 49 641 0.963 55.25
13.02 PlyA + 26012 26017 6 -0.45

14.00 Prom + 26042 26081 40 -15.72


14.01 Init + 26108 27308 1201 1 1 74 75 443 0.399 34.87
14.02 Intr + 27688 27999 312 2 0 55 15 237 0.042 9.26
14.03 Intr + 28828 30279 1452 2 0 15 36 589 0.010 36.05
14.04 Term + 31521 32227 707 1 2 64 42 332 0.123 19.88
14.05 PlyA + 32449 32454 6 -0.45

15.00 Prom + 32479 32518 40 -13.24


15.01 Init + 32554 34197 1644 0 0 93 25 583 0.346 44.59
15.02 Intr + 34783 35570 788 0 2 33 33 272 0.364 6.47
15.03 Term + 35886 37416 1531 0 1 -31 49 737 0.285 47.86
15.04 PlyA + 37420 37425 6 -5.51

16.00 Prom + 37448 37487 40 -13.24


16.01 Sngl + 37526 39676 2151 1 0 72 33 1201 0.998 107.17
16.02 PlyA + 39713 39718 6 -0.45

17.00 Prom + 39868 39907 40 -7.96


17.01 Sngl + 39938 42094 2157 1 0 89 40 1247 0.932 114.14
17.02 PlyA + 42117 42122 6 -0.45

18.00 Prom + 42146 42185 40 -13.24


18.01 Sngl + 42227 44506 2280 1 0 51 36 825 0.480 66.81
18.02 PlyA + 44520 44525 6 -5.12

19.00 Prom + 44543 44582 40 -15.08


19.01 Sngl + 44617 46767 2151 0 0 72 33 1153 0.997 102.37
19.02 PlyA + 46804 46809 6 -0.45

20.00 Prom + 46924 46963 40 -9.06


20.01 Sngl + 47018 49177 2160 1 0 89 38 1193 0.949 108.53
20.02 PlyA + 49197 49202 6 -0.45

21.00 Prom + 49247 49286 40 -12.96


21.01 Sngl + 49439 51568 2130 1 0 77 36 779 0.704 65.84
21.02 PlyA + 52825 52830 6 1.05

22.00 Prom + 53053 53092 40 -9.36


22.01 Init + 53164 54702 1539 0 0 72 -6 469 0.165 28.30
22.02 Intr + 55106 55616 511 1 1 3 25 291 0.042 6.84
22.03 Term + 55711 55916 206 2 2 66 42 181 0.752 8.83
22.04 PlyA + 55939 55944 6 -5.99

23.00 Prom + 55971 56010 40 -12.78


23.01 Sngl + 56013 58292 2280 2 0 51 36 890 0.257 73.31
23.02 PlyA + 58306 58311 6 -5.12

24.00 Prom + 58331 58370 40 -13.78


24.01 Sngl + 58409 60559 2151 1 0 72 33 900 0.979 77.07
24.02 PlyA + 60596 60601 6 -0.45

25.00 Prom + 60720 60759 40 -11.14


25.01 Sngl + 60817 62973 2157 0 0 85 40 799 0.946 68.94
25.02 PlyA + 62996 63001 6 -0.45

72
26.00 Prom + 63021 63060 40 -13.78
26.01 Sngl + 63101 63793 693 1 0 73 49 386 0.985 29.41
26.02 PlyA + 63940 63945 6 1.05

27.00 Prom + 64006 64045 40 -11.72


27.01 Sngl + 64068 65564 1497 2 0 84 43 719 0.960 62.45
27.02 PlyA + 65682 65687 6 -3.24

28.00 Prom + 65827 65866 40 -2.16


28.01 Init + 67149 68696 1548 2 0 83 71 698 0.591 59.74
28.02 Term + 69149 69736 588 1 0 3 42 289 0.529 10.32
28.03 PlyA + 69958 69963 6 -0.45

Predicted peptide sequence(s):


Predicted coding sequence(s):

>gi|GENSCAN_predicted_peptide_1|230_aa
MDSNTVSSFQVDCFLWHVRKRFADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIRTATREGKHIVERILEEESDEALKMTIASVPASRYLTEMTL
EEMSRDWLMLIPKQKVTGPLCIRMDQAVMGKTIILKANFSVIFNRLEALILLRAFTDEGAIVGEISPLPSLPGHTDEDVKNAIGVLIGGLEWNDN
TVRVSETLQRFTWRSSDENGRSPLPPKQKRKVERTIEPEV
>gi|GENSCAN_predicted_CDS_1|693_bp
atggattccaacactgtgtcaagctttcaggtagactgctttctttggcatgtccgcaaacgatttgcagaccaagaactgggtgatgccccatt
ccttgaccggcttcgccgagatcagaagtccctaagaggaagaggcagcactcttggtctggacatcagaactgccactcgtgaaggaaagcata
tagtggagcggattctggaggaagaatctgacgaggcacttaaaatgactatcgcttcagtgcctgcttcacgctacctaactgaaatgactctt
gaggaaatgtcaagggattggttaatgctcattcccaagcagaaagtgacagggcccctttgcattagaatggaccaggcagtaatgggtaaaac
catcatattgaaagcaaactttagtgtgatttttaatcgacttgaagctctgatactacttagagcgtttacagatgaaggagcaatagtgggcg
aaatctcaccattaccttcccttccaggacatactgacgaggatgtcaaaaatgcaattggggtcctcatcggaggacttgaatggaatgataac
acagttcgagtctctgaaactctacagagattcacttggagaagcagtgatgagaatgggagatctccactccctccaaaacagaaacggaaagt
ggagagaacaattgagccagaagtttga
>gi|GENSCAN_predicted_peptide_2|498_aa
MASQGTKRSYEQMETGGERQNATEIRASVGRMVGGIGRFYVQMCTELKLSDQEGRLIQNSITIERMVLSAFDERRNRYLEEHPSAGKDPKKTGGP
IYRRRDGKWVRELILYDKEEIRRIWRQANNGEDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAIKGVGTMV
MELIRMIKRGINDRNFWRGDNGRRTRIAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAV
ASGYDFEREGYSLVGIDPFRLLQNSQVFSLIRPNENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVIPRGQLSTRGVQIASNENVEAMDSSTLE
LRSRYWAIRTRSGGNTNQQRASAGQISVQPTFSVQRNLPFERPTIMAAFKGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKATNPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_2|1497_bp
atggcgtcgcaaggcaccaaacgatcctatgaacagatggaaactggtggagaacgccagaatgccactgagatcagggcatctgttggaagaat
ggttggtggaattgggaggttttacgtacagatgtgcactgaactcaaactcagcgaccaagaaggaaggttgatccagaacagtataacaatag
agagaatggttctctccgcatttgatgaaaggaggaacaggtacctagaggaacatcccagtgcggggaaggacccgaagaagaccggaggtcca
atctaccgaaggagagacgggaaatgggtgagagagctgattctgtatgacaaagaggagataaggagaatttggcgtcaagcgaacaatggaga
agacgcaactgctggtctcactcatatgatgatctggcattccaacctaaatgatgccacataccagagaacaagagccctcgtgcggactggaa
tggaccccagaatgtgctctctgatgcaaggatcaaccctcccgaggagatctggagctgctggtgcagcaataaagggagtcgggacaatggta
atggaactaattcggatgataaagcgaggcattaatgaccggaacttctggagaggcgataatggacgaagaacaaggattgcatatgagagaat
gtgcaacatcctcaaagggaaatttcaaacagcagcacaaagagcaatgatggatcaggtgcgagaaagcagaaatcctgggaatgctgaaattg
aagatctcatctttctggcacggtctgcactcatcctgagaggatccgtagcccataagtcctgcttgcctgcttgtgtgtacgggctcgctgtg
gccagtggatatgattttgagagggaagggtactctctggttgggatagatcctttccgtctgcttcagaacagtcaggtcttcagtcttattag
accaaatgagaatccagcacataaaagtcaattggtatggatggcatgccattctgcagcatttgaggacctgagagtctcaagtttcattagag
gaacaagagtgatcccaagaggacaactatccactagaggagttcagattgcttcaaatgagaacgtggaagcaatggattccagcactcttgaa
ctgagaagcagatattgggctataaggaccaggagtggaggaaacaccaatcaacagagagcatctgcaggacaaatcagtgtacagcccacttt
ctcagtacagagaaatcttcccttcgaaagaccgaccattatggctgcgtttaaggggaataccgagggcagaacatctgacatgaggactgaaa
tcataaggatgatggaaagtgccagaccagaagatgtgtctttccaggggcggggagtcttcgagctctcggacgaaaaggcaacgaacccgatc
gtgccttcctttgacatgagtaatgaaggatcttatttcttcggagacaatgcagaggaatatgacaattga
>gi|GENSCAN_predicted_peptide_3|498_aa
MASQGTKRSYEQMETDGERQNATEIRASVGKMIDGIGRFYIQMCTELKLSDYEGRLIQNSLTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGP
IYKRVDGKWMRELVLYDKEEIRRIWRQANNGDDATAGLTHMMIWHSNLNDTTYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELIRMIKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGPAI
ASGYNFEKEGYSLVGIDPFKLLQNSQVYSLIRPNENPAHKSQLVWMACNSAAFEDLRVLSFIRGTKVSPRGKLSTRGVQIASNENMDTMESSTLE
LRSRYWAIRTRSGGNTNQQRASAGQISVQPAFSVQRNLPFDKPTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEMSFQGRGVFELSDEKATNPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_3|1497_bp
atggcgtcccaaggcaccaaacggtcttatgaacagatggaaactgatggggaacgccagaatgcaactgagatcagagcatccgtcgggaagat
gattgatggaattggacgattctacatccaaatgtgcaccgaacttaaactcagtgattatgaggggcgactgatccagaacagcttaacaatag
agagaatggtgctctctgcttttgacgagagaaggaataaatatctggaagaacatcccagcgcggggaaggatcctaagaaaactggaggaccc
atatacaagagagtagatggaaagtggatgagggaactcgtcctttatgacaaagaagaaataaggcgaatctggcgccaagccaataatggtga
tgatgcaacagctgggctgactcacatgatgatctggcattccaatttgaatgatacaacataccagaggacaagagctcttgttcgcaccggaa
tggatcccaggatgtgctctttgatgcagggttcgactctccctaggaggtctggagctgcaggcgctgcagtcaaaggagttgggacaatggtg
atggagttgatcaggatgatcaaacgtgggatcaatgatcggaacttctggagaggtgagaatggacggaaaacaaggagtgcttacgagagaat
gtgcaacattctcaaaggaaaatttcaaacagctgcacaaagagcaatgatggatcaagtgagagaaagccggaacccaggaaatgctgagatcg
aagatctaatctttctggcacggtctgcactcatattgagagggtcagttgctcacaaatcttgtctgcccgcctgtgtgtatggacctgccata
gccagtgggtacaacttcgaaaaagagggatactctctagtgggaatagaccctttcaaactgcttcaaaacagccaagtatacagcctaatcag
accgaacgagaatccagcacacaagagtcagctggtgtggatggcatgcaattctgctgcatttgaagatctaagagtattaagcttcatcagag
ggaccaaagtatccccaagggggaaactttccactagaggagtacaaattgcttcaaatgaaaacatggatactatggaatcaagtactcttgaa
ctaagaagcaggtactgggccataaggaccagaagtggaggaaacactaatcaacagagggcctctgcaggtcaaatcagtgtacaacctgcatt
73
ttctgtgcaaagaaacctcccatttgacaaaccaaccatcatggcagcattcactgggaatacagagggaagaacatcagacatgagggcagaaa
tcataaggatgatggaaggtgcaaaaccagaagaaatgtccttccaggggcggggagtcttcgagctctcggacgaaaaggcaacgaacccgatc
gtgccctcttttgacatgagtaatgaaggatcttatttcttcggagacaatgcagaggagtacgacaattaa
>gi|GENSCAN_predicted_peptide_4|759_aa
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLA
VTWWNRNGPMTSTVHYPKIYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELQDCKISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCH
STQIGGTRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSIKREEEVLTGNLQTLKIRVHEGYEEFTMVGKRATAILRKATRR
LVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDNVMGMIGVLPDMTPSTEMSM
RGIRVSKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQNPTM
LYNKMEFEPFQSLVPKAIRGQYSGFVRTLFQQMRDVLGTFDTTQIIKLLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRL
TILGKDAGTLTEDPDEGTSGVESAVLRGFLILGKEDRRYGPALSINELSTLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_4|2280_bp
atggaaagaataaaagaactacggaatctgatgtcgcagtctcgcactcgcgagatactaacaaaaaccacagtggaccatatggccataattaa
gaagtacacatcagggagacaggaaaagaacccgtcacttaggatgaaatggatgatggcaatgaaatatccaattacagctgacaagaggataa
cagaaatggttcctgagagaaatgagcaaggacaaactctatggagtaaaatgagtgatgccgggtcagatcgagtaatggtatcacctttggca
gtgacatggtggaatagaaatggaccaatgacaagtacggttcattatccaaaaatctacaagacttattttgagaaagtcgaaaggttaaaaca
tggaacctttggccctgtccattttagaaaccaagtcaaaatacgccgaagagttgacataaaccctggtcatgcagacctcagtgccaaggagg
cacaagacgtaatcatggaagttgttttccccaatgaagtgggggccaggatactaacgtcggaatcacaattaacaataaccaaagagaaaaaa
gaagaactccaagattgcaaaatttctcctttgatggttgcatacatgttagagagagaacttgtccgaaaaacgagatttctcccagttgctgg
tggaacaagcagtgtgtacattgaagtgttacacttgactcaaggaacatgttgggaacagatgtacaccccaggtggagaagtgaggaatgatg
atgttgatcaaagtctaattattgcagccaggaacatagtgagaagagcagcagtatcagcagatccactagcatctttattggagatgtgccac
agcacacagattggcgggacaaggatggtggacattcttaggcagaacccaacggaagaacaagctgtggatatatgcaaggctgcaatgggact
gagaatcagctcatccttcagttttggcgggttcacatttaagagaacaagcgggtcatcaatcaagagagaggaagaagtgcttacgggcaatc
tccaaacattgaaaataagggtgcatgaggggtacgaggaattcacaatggtggggaaaagggcaacagctatactcagaaaagcaaccaggaga
ttggttcagctgatagtgagtggaagagacgaacagtcaatagccgaagcaataattgtagccatggtgttttcacaagaagattgcatgataaa
agcagttagaggtgacctgaatttcgttaatagggcaaatcagcgattgaatcccatgcatcaacttttaagacattttcagaaagatgcaaaag
tgctctttcaaaattggggaattgaacatatcgacaatgtaatgggaatgattggagtattaccagacatgactccaagcacagagatgtcaatg
agagggataagagtcagcaaaatgggcgtggatgaatactccagcacagagagggtagtggtaagcattgaccggtttttgagagttcgagacca
acgaggaaatgtactactatctcctgaggaggtcagtgaaacacaggggacagagaaactgacaataacttactcatcgtcaatgatgtgggaga
ttaatggccctgagtcagtgttggtcaatacctatcagtggatcatcagaaactgggaaactgttaaaattcaatggtctcagaatcctacaatg
ctatacaataaaatggaatttgagccatttcagtctttagttcctaaggccattagaggccaatacagtggatttgttaggactctattccaaca
aatgagggatgtacttgggacatttgataccacccagataataaagcttcttccctttgcagccgccccaccaaagcaaagtagaatgcagttct
cttcattgactgtgaatgtgaggggatcaggaatgagaatacttgtaaggggcaattctcctgtattcaactacaacaagaccactaagagacta
acaattctcggaaaggatgctggcactttaactgaagacccagatgaaggcacatccggagtggagtccgctgttctgagaggattcctcattct
gggcaaggaagatagaagatatggaccagcattaagcatcaatgaactgagtacccttgcaaaaggagaaaaggctaatgtactaattgggcaag
gagacgtggtgttggtaatgaaacgaaaacgggactctagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattaa
>gi|GENSCAN_predicted_peptide_5|252_aa
MSLLTEVETYVLSIVPSGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDR
AVKLYRKLKREITFHGAKEVALSYSAGALASCMGLIYNRMGAVTTEVAFAVVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAME
QMAGSSEQAAEAMEVASQARQMVQAMRAIGTPPSSSAGLKDDLLENLQAYQKRMGVQMQRFK
>gi|GENSCAN_predicted_CDS_5|759_bp
atgagccttctaaccgaggtcgaaacgtacgttctctctatcgtcccgtcaggccccctcaaagccgagatcgcacagagacttgaagatgtctt
tgctgggaagaacacagatcttgaggctctcatggaatggctaaagacaagaccaatcctgtcacctctgactaaggggattttgggatttgtat
tcacgctcaccgtgccaagtgagcgaggactgcagcgtagacgctttgtccaaaatgccctcaatgggaatggggatccaaataacatggacaga
gcagttaaactgtatagaaagcttaagagggagataacattccatggggccaaagaagtagcgctcagttattctgctggtgcacttgccagttg
catgggcctcatatacaacaggatgggggctgtgaccactgaagtggcctttgccgtggtatgtgcaacctgtgaacagattgctgactcccagc
ataggtctcacaggcaaatggtgacaacaaccaatccactaataagacatgagaacagaatggttctggccagcactacagctaaggctatggag
caaatggctggatcgagtgagcaagcagcagaggccatggaggttgctagtcaggccaggcaaatggtgcaggcaatgagagccattgggactcc
tcctagctccagtgctggtctaaaagatgatcttcttgaaaatttgcaggcctatcagaaacgaatgggggtgcagatgcaacgattcaagtga
>gi|GENSCAN_predicted_peptide_6|716_aa
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIMVELDDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGAEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMANRGLWDS
FRQSERGEETIEERFEITGTMRRLADQSLPPNFSCLENFRAYVDGFEPNGYIEGKLSQMSKEVNAKIEPFLKTTPRPIRLPDGPPCFQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMRTFFGWKEPYIVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVD
FDNCRDISDLKQYDSDEPELRSLSSWIQNEFNKACELTDSIWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQMSRPMFLYVRTNGT
SKIKMKWGMEMRPCLLQSLQQIESMVEAESSVKEKDMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALR
>gi|GENSCAN_predicted_CDS_6|2151_bp
atggaagattttgtgcgacaatgcttcaatccgatgattgtcgaacttgcggaaaaggcaatgaaagagtatggagaagatctgaaaatcgaaac
aaacaaatttgcagcaatatgcactcacttggaagtatgcttcatgtattcagattttcatttcatcaatgagcaaggcgagtcaataatggtag
agcttgatgatccaaatgcacttttgaagcacagatttgaaataatagagggaagagatcgcacaatggcctggacagtagtaaacagtatttgc
aacaccacaggagctgagaaaccgaagtttctgccagatttgtatgattacaaggagaatagattcatcgagattggagtgacaaggagagaagt
ccacatatactatcttgaaaaggccaataaaattaaatctgagaatacacacatccacattttctcattcactggggaagaaatggccacaaagg
ccgactacactctcgatgaggaaagcagggctaggatcaaaaccagactattcaccataagacaagaaatggccaacagaggcctctgggattcc
tttcgtcagtccgaaagaggcgaagaaacaattgaagaaagatttgaaatcacagggacaatgcgcaggcttgccgaccaaagtctcccgccgaa
cttctcctgccttgagaattttagagcctatgtggatggattcgaaccgaacggctacattgagggcaagctttctcaaatgtccaaagaagtaa
atgcaaaaattgaaccttttctgaaaacaacaccaagaccaattagacttccggatgggcctccttgttttcagcggtccaaattcctgctgatg
gatgctttaaaattaagcattgaggacccaagtcacgaaggggagggaataccactatatgatgcgatcaagtgcatgagaacattctttggatg
gaaagaaccctatattgttaaaccacacgaaaagggaataaatccaaattatctgctgtcatggaagcaagtactggcggaactgcaggacattg
agaatgaggagaagattccaagaactaaaaacatgaagaaaacgagtcagctaaagtgggcacttggtgagaacatggcaccagagaaggtagac
tttgacaactgtagagacataagcgatttgaagcaatatgatagtgacgaacctgaattaaggtcactttcaagctggatccagaatgagttcaa
caaggcatgcgagctgaccgattcaatctggatagagctcgatgagattggagaagacgtggctccaattgaacacattgcaagcatgagaagga
attacttcacagcagaggtgtcccattgcagagccacagaatatataatgaagggggtatacattaatactgccttgcttaatgcatcctgtgca
gcaatggacgatttccaactaattcccatgataagcaagtgtagaactaaagagggaaggcgaaagaccaatttatatggtttcatcataaaagg

74
aagatctcacttaaggaatgacaccgacgtggtaaactttgtgagcatggagttttctctcactgacccgagacttgagccacacaaatgggaga
agtactgtgtccttgagataggagatatgctactaagaagtgccataggccagatgtcaaggcctatgttcttgtatgtgaggacaaatggaaca
tcaaagattaaaatgaaatggggaatggagatgaggccttgcctccttcagtcactacaacaaatcgagagtatggttgaagccgagtcctctgt
caaagagaaagacatgaccaaagagttttttgagaataaatcagaaacatggcccattggggagtcccccaaaggagtggaagaaggttccattg
ggaaggtctgcaggactttattagccaagtcggtattcaatagcctgtatgcatccccacaattagaaggattttcagctgaatcaagaaaactg
cttcttgtcgttcaggctcttagggacaatcttgaacctggaacctttgatcttggggggctatatgaagcaattgaggagtgcctgattaatga
tccctgggttttgcttaatgcgtcttggttcaactccttcctaacacatgcattaagatag
>gi|GENSCAN_predicted_peptide_7|718_aa
MDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEVIQQTRVDKLTQGRQTYDWTLN
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVIESMDKEEMEITTHFQRKRRVRDNMTKKMVTQRTIGKKKQRLNKRSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVHFVETLARNICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSFTITGDNTKWNENQNPRVFLAMITY
ITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTVSLSPGMMMGMFNMLSTVL
GVSILNLGQKKYTKXTYWWDGLQSSDDFALIVNAPNHEGIQAGVNRFYRTCKLVGINMSKKKSYINRTGTFEFTSFFYRYGFVANFSMELPSFGV
SGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGSNLYNIRNLHIPEV
CLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWTPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSS
YRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFAEIMKICSTIEELRRQK
>gi|GENSCAN_predicted_CDS_7|2157_bp
atggacacagtcaacagaacacatcaatattcagaaaaggggaagtggacaacaaacacggaaactggagcgccccaacttaacccaattgatgg
accactacctgaggacaatgaaccaagtggatatgcacaaacagactgcgtcctggaagcaatggctttccttgaggaatcacacccaggaatct
ttgaaaattcgtgtcttgaaacgatggaagttattcaacaaacaagagtggacaaactgacccaaggtcgtcagacctatgactggacattgaac
agaaatcagccggctgcaactgcgctagccaacactatagaggtcttcagatcgaatggactgacagctaatgagtcgggaaggctaatagattt
cctcaaggatgtgatagaatcaatggataaagaggagatggaaataacaacacacttccaaagaaaaagaagagtaagagacaacatgaccaaga
aaatggtcacacaacgaacaataggaaagaagaagcaaagattgaacaagagaagctatctgataagagcactgacattgaacacaatgactaaa
gatgcagagagaggtaaattaaaaagaagagcaattgcaacacccggtatgcagatcagagggttcgtgcactttgtcgaaacactagcgagaaa
tatttgtgagaaacttgaacagtctgggcttccggttggaggtaatgaaaagaaggctaaactagcaaatgttgttagaaaaatgatgactaatt
cacaagacacagagctctctttcacaattactggagacaacaccaaatggaatgagaatcaaaatcctcgagtgtttctggcgatgataacatac
atcacaagaaatcaacctgaatggtttagaaacgtcctgagcattgcacccataatgttctcaaataaaatggctagactagggaaaggttacat
gttcgaaagcaagagcatgaagctccgaacacaaataccagcagaaatgctagcaagtattgacctgaaatactttaatgaatcaaccagaaaga
aaattgagaaaataaggcctctcctaatagatggcacagtctcattgagtcctggaatgatgatgggcatgttcaacatgctaagtacagtctta
ggagtctcaatcctgaatctcgggcaaaagaaatacaccaaaacnacatactggtgggacggactccaatcctctgatgacttcgctctcatagt
gaatgcaccaaatcatgagggaatacaagcaggggtgaatagattctacagaacctgcaagctagtcggaatcaatatgagcaaaaagaagtcct
acataaataggacagggacatttgaattcacaagctttttctatcgctatggatttgtagccaattttagcatggagctgcccagctttggagtg
tctggaattaatgaatcggctgatatgagcattggggtaacagtgataaagaacaatatgataaataatgaccttgggccagcaacagcccaaat
ggctcttcaactattcatcaaagactacagatacacgtaccggtgccacagaggggacacacaaattcagacaaggagatcattcgagctaaaga
agctgtgggagcaaacccgctcaaaggcaggacttttggtgtcggatggaggatcaaacttatacaatatccggaatctccacattccagaagtc
tgcttgaaatgggagctaatggatgaagactatcaggggaggctttgtaatcccctgaatccatttgtcagtcataaggaaattgagtctgtaaa
caatgctgtggtaatgccagctcacggtccagccaagagcatggaatatgatgctgttgctactacacactcctggacccctaagaggaaccgct
ccattctcaacacaagccaaaggggaattcttgaagatgaacagatgtatcagaagtgttgcaatctatttgagaaattcttccctagcagttcg
tacaggagaccagttggaatttccagcatggtggaggccatggtgtctagggctcggattgatgcacggattgacttcgagtctggacggattaa
gaaagaggagttcgctgagatcatgaagatctgttccaccattgaagagctcagacggcaaaaatag
>gi|GENSCAN_predicted_peptide_8|562_aa
MAIIYLILLFTAVRGDQICIGYHANNSTEKVDTILERNVTVTHAKDILEKTHNGKLCKLNGIPPLELGDCSIAGWLLGNPECDRLLSVPEWSYIM
EKENPRYSLCYPGSFNDYEELKHLLSSVKHFEKVKILPKDRWTQHTTTGGSWACAVSGKPSFFRNMVWLTRKGSNYPVAKGSYNNTSGEQMLIIW
GVHHPNDEAEQRALYQNVGTYVSVATSTLYKRSIPEIAARPKVNGLGRRMEFSWTLLDMWDTINFESTGNLVAPEYGFKISKRGSSGIMKTEGTL
ENCETKCQTPLGAINTTLPFHNVHPLTIGECPKYVKSEKLVLATGLRNVPQIESRGLFGAIAGFIEGGWQGMVDGWYGYHHSNDQGSGYAADKES
TQKAFNGITNKVNSVIEKMNTQFEAVGKEFSNLEKRLENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRMQLRDNVKELGN
GCFEFYHKCDNECMDSVKNGTYDYPKYEEESKLNRNEIKGVKLSSMGVYQILAIYATVAGSLSLAIMMAGISFWMCSNGSLQCRICI

>gi|GENSCAN_predicted_CDS_8|1689_bp
atggccatcatttatctcatactcctgttcacagcagtgaggggggaccagatatgcattggataccatgccaataattccacagaaaaggtcga
cacaattctagagcggaatgtcactgtgactcatgccaaggacatccttgagaagacccataacggaaagctatgcaaactaaacggaatccctc
cacttgaactaggggactgtagcattgccggatggctccttggaaatccagaatgtgataggcttctaagtgtgccagaatggtcctatataatg
gagaaagaaaacccgagatacagtttgtgttacccaggcagcttcaatgactatgaagaattgaaacatctcctcagcagcgtgaaacattttga
gaaagttaagattttgcccaaagatagatggacacagcatacaacaactggaggttcatgggcctgcgcggtgtcaggtaaaccatcattcttca
ggaacatggtctggctgacacgtaaaggatcaaattatccggttgccaaaggatcgtacaacaatacaagcggagaacaaatgctaataatttgg
ggagtgcaccatcctaatgatgaggcagaacaaagagcattgtaccagaatgtgggaacctatgtttccgtagccacatcaacattgtacaaaag
gtcaatcccagaaatagcagcaaggcctaaagtgaatggactaggacgtagaatggaattctcttggaccctcttggatatgtgggacaccataa
attttgagagcactggtaatctagttgcaccagagtatgggttcaaaatatcgaaaagaggtagttcagggatcatgaagacagaaggaacactt
gagaactgtgaaaccaaatgccaaactcctttgggagcaataaatacaacactaccttttcacaatgtccacccactgacaataggtgaatgccc
caaatatgtaaaatcggagaaattggtcttagcaacaggactaaggaatgttccccagattgaatcaagaggattgtttggggcaatagctggtt
ttatagaaggaggatggcaaggaatggttgatggttggtatggataccatcacagcaatgaccagggatcagggtatgcagcagacaaagaatcc
actcaaaaggcatttaatggaatcaccaacaaggtaaattctgtgattgaaaagatgaacacccaatttgaagctgttgggaaagaattcagtaa
cttagagaaaagactggagaacttgaacaaaaagatggaagacgggtttctagatgtgtggacatacaatgcagagcttctagttctgatggaaa
atgagaggacacttgactttcatgattctaatgtcaagaatctgtatgataaagtcagaatgcagctgagagacaacgtcaaagaactaggaaat
ggatgttttgaattttatcacaaatgtgacaatgaatgcatggatagtgtgaaaaacgggacatatgattatcccaagtatgaagaagaatctaa
actaaatagaaatgaaatcaaaggggtaaaattgagcagcatgggggtttatcaaatccttgccatttatgctacagtagcaggttctctgtcac
tggcaatcatgatggctgggatctctttctggatgtgctccaacgggtctctgcagtgcagaatctgcatatga
>gi|GENSCAN_predicted_peptide_9|759_aa
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPSLRMKWMMAMKYPITADKRITEMVPERNEQGQTLWSKMSDAGSDRVMVSPLA
VTWWNRNGPVASTVHYPKVYKTYFDKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELRDCKISPLMVAYMLERELVRKTRFLPVAGGTSSIYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCH
STQIGGTRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTSGSSVKKEEEVLTGNLQTLKIRVHEGYEEFTMVGKRATAILRKATRR
LVQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEHIDSVMGMVGVLPDMTPSTEMSM
RGIRVSKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTERLTITYSSSMMWEINGPESVLVNTYQWIIRNWEAVKIQWSQNPAM
LYNKMEFEPFQSLVPKAIRSQYSGFVRTLFQQMRDVLGTFDTTQIIKLLPFAAAPPKQSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKTTKRL
TILGKDAGTLIEDPDESTSGVESAVLRGFLIIGKEDRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN

75
>gi|GENSCAN_predicted_CDS_9|2280_bp
atggaaagaataaaagaactacggaacctgatgtcgcagtctcgcactcgcgagatactgacaaaaaccacagtggaccatatggccataattaa
gaagtacacatcggggagacaggaaaagaacccgtcacttaggatgaaatggatgatggcaatgaaatacccaatcactgctgacaaaaggataa
cagaaatggttccggagagaaatgaacaaggacaaactctatggagtaaaatgagtgatgctggatcagatcgagtgatggtatcacctttggct
gtaacatggtggaatagaaatggacccgtggcaagtacggtccattacccaaaagtatacaagacttattttgacaaagtcgaaaggttaaaaca
tggaacctttggccctgttcattttagaaatcaagtcaagatacgcagaagagtagacataaaccctggtcatgcagacctcagtgccaaagagg
cacaagatgtaattatggaagttgtttttcccaatgaagtgggagccaggatactaacatcagaatcgcaattaacaataactaaagagaaaaaa
gaagaactccgagattgcaaaatttctcccttgatggttgcatacatgttagagagagaacttgtccgaaaaacaagatttctcccagttgctgg
cggaacaagcagtatatacattgaagtcttacatttgactcaaggaacgtgttgggaacaaatgtacactccaggtggagaagtgaggaatgacg
atgttgaccaaagcctaattattgcggccaggaacatagtaagaagagctgcagtatcagcagatccactagcatctttattggagatgtgccac
agcacacaaattggcgggacaaggatggtggacattcttagacagaacccgactgaagaacaagctgtggatatatgcaaggctgcaatgggatt
gagaatcagctcatccttcagctttggtgggtttacatttaaaagaacaagcgggtcatcagtcaaaaaagaggaagaagtgcttacaggcaatc
tccaaacattgaagataagagtacatgaggggtatgaggagttcacaatggtggggaaaagagcaacagctatactcagaaaagcaaccagaaga
ttggttcagctcatagtgagtggaagagacgaacagtcaatagccgaagcaataatcgtggccatggtgttttcacaagaggattgcatgataaa
agcagttagaggtgacctgaatttcgtcaacagagcaaatcaacggttgaaccccatgcatcagcttttaaggcattttcagaaagatgcgaaag
tgctttttcaaaattggggaattgaacacatcgacagtgtgatgggaatggttggagtattaccagatatgactccaagcacagagatgtcaatg
agaggaataagagtcagcaaaatgggtgtggatgaatactccagtacagagagggtggtggttagcattgatcggtttttgagagttcgagacca
acgcgggaatgtattattgtctcctgaggaggtcagtgaaacacagggaactgaaagattgacaataacatattcatcgtcgatgatgtgggaga
ttaacggtcctgagtcggttttggtcaatacctatcaatggatcatcagaaattgggaagctgtcaaaattcaatggtctcagaatcctgcaatg
ttgtacaacaaaatggaatttgaaccatttcaatctttagtccccaaggccattagaagccaatacagtgggtttgtcagaactctattccaaca
aatgagagacgtacttgggacatttgacaccacccagataataaagcttctcccttttgcagccgctccaccaaagcaaagcagaatgcagttct
cttcactgactgtaaatgtgaggggatcagggatgagaatacttgtaaggggcaattctcctgtattcaactacaacaagaccactaaaagacta
acaattctcggaaaagatgccggcactttaattgaagacccagatgaaagcacatccggagtggagtccgccgtcttgagagggtttctcattat
aggtaaggaagacagaagatacggaccagcattaagcatcaatgaactgagtaaccttgcaaaaggggaaaaggctaatgtgctaatcgggcaag
gagacgtggtgttggtaatgaaacgaaaacgggactctagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattaa
>gi|GENSCAN_predicted_peptide_10|757_aa
MDVNPTLLFLKVPAQNAISTTFPYTGDPPYSHGTGTGYTMDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFL
EESHPGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLNRNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRR
VRDNMTKKMVTQRTIGKKKQRVNKRGYLIRALTLNTMTKDAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANV
VRKMMTNSQDTELSFTITGDNTKWNENQNPRMFLAMITYITKNQPEWFRNILSIAPIMFSNKMARLGKGYMFESKRMKLRTQIPAEMLASIDLKY
FNESTRKKIEKIRPLLIDGTASLSPGMMMGMFNMLSTVLGVSVLNLGQKKYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGI
NMSKKKSYINKTGTFEFTSFFYRYGFVANFSMELPSFGVSGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQT
RRSFELKKLWDQTQSRAGLLVSDGGPNLYNIRNLHIPEVCLKWELMDENYRGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHS
WNPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSSYRRPIGISSMVEAMVSRARIDARIDFESGRIKKEEFSEIMKICSTIEELRRQK
>gi|GENSCAN_predicted_CDS_10|2274_bp
atggatgtcaatccgactctactgttcctaaaggttccagcgcaaaatgccataagcaccacattcccttatactggagatcctccatacagcca
tggaacaggaacaggatacaccatggacacagtcaacagaacacaccaatattcagagaaggggaagtggacgacaaatacagaaactggggcac
cccaactcaacccaattgatggaccactacctgaggataatgagccaagtggatatgcacaaacagactgtgtcctggaggctatggccttcctt
gaagaatcccacccaggtatctttgagaactcatgccttgaaacaatggaagtcgttcaacaaacaagggtggacaaactaacccaaggccgcca
gacttatgattggacattaaacagaaatcaaccggcagcaactgcattagccaacaccatagaagtttttagatcgaatggactaacagccaatg
aatcaggaaggctaatagatttcctcaaggatgtgatggaatcaatggataaagaggaaatggagataacaacacactttcaaagaaaaaggaga
gtaagagacaacatgaccaagaaaatggtcacacaaagaacaatagggaagaaaaaacaaagagtgaataagagaggctatctaataagagcttt
gacattgaacacgatgaccaaagatgcagagagaggtaaattaaaaagaagggctattgcaacacccgggatgcaaattagagggttcgtgtact
tcgttgaaactttagctagaagcatttgcgaaaagcttgaacagtctggacttccggttgggggtaatgaaaagaaggccaaactggcaaatgtt
gtgagaaaaatgatgactaattcacaagacactgagctttctttcacaatcactggggacaacactaagtggaatgaaaatcaaaaccctcgaat
gtttttggcgatgattacatatatcacaaaaaatcaacctgagtggttcagaaacatcctgagcatcgcaccaataatgttctcaaacaaaatgg
caagactaggaaaaggatacatgttcgagagtaagagaatgaagctccgaacacaaatacccgcagaaatgctagcaagcattgacctgaagtat
ttcaatgaatcaacaaggaagaaaattgagaaaataaggcctcttctaatagatggcacagcatcattgagccctgggatgatgatgggcatgtt
caacatgctaagtacggttttaggagtctcggtactgaatcttgggcaaaagaaatacaccaagacaacatactggtgggatgggctccaatcct
ccgacgattttgccctcatagtgaatgcaccaaatcatgagggaatacaagcaggagtggatagattctacaggacctgcaagttagtgggaatc
aacatgagcaaaaagaagtcctatataaataaaacagggacatttgaattcacaagctttttttatcgatatggatttgtggctaattttagcat
ggagcttcccagttttggagtgtctggaataaacgagtcagctgatatgagtattggagtaacagtgataaagaacaacatgataaacaatgacc
ttgggccagcaacagcccagatggctctccaattgttcatcaaagactacagatatacatataggtgccatagaggagacacacaaattcagacg
agaagatcattcgagctaaagaagctgtgggatcaaacccaatcaagggcaggactattggtatcagatgggggaccaaacttatacaatatccg
gaaccttcacatccctgaagtctgcttaaagtgggagctaatggatgagaattatcggggaagactttgtaaccccctgaatccctttgtcagcc
ataaagaaattgagtctgtaaacaatgctgtagtgatgccagcccacggtccagccaaaagtatggaatatgatgccgttgcaactacacactcc
tggaatcccaagaggaaccgctctattctaaacactagccaaaggggaattcttgaggatgaacagatgtaccaaaagtgctgcaacttgttcga
gaaatttttccctagtagttcatataggagaccgattggaatttctagcatggtggaggccatggtgtctagggcccggattgatgccagaattg
acttcgagtctggacggattaagaaggaagagttctctgagatcatgaagatctgttccaccattgaagaactcagacggcaaaaataa
>gi|GENSCAN_predicted_peptide_11|716_aa
MEDFVRQCFNPMIVELAEKAMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIVVELDDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGAEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSENTHIHIFSFTGEEIATKADYTLDEESRARIKTRLFTIRQEMANRGLWDS
FRQSERGEETIEEKFEISGTMRRLADQSLPPKFSCLENFRAYVDGFEPNGCIEGKLSQMSKEVNAKIEPFLKTTPRPIKLPNGPPCYQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCIKTFFGWKEPYIVKPHEKGINSNYLLSWKQVLSELQDIENEEKIPRTKNMKKTSQLKWALGENMAPEKVD
FDNCRDISDLKQYDSDEPELRSLSSWIQNEFNKACELTDSIWIELDEIGEDVAPIEYIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQISRPMFLYVRTNGT
SKVKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDMTKEFFENKSEAWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLVVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALK
>gi|GENSCAN_predicted_CDS_11|2151_bp
atggaagattttgtgcgacaatgcttcaacccgatgattgtcgaacttgcagaaaaagcaatgaaagagtatggagaggatctgaaaattgaaac
aaacaaatttgcagcaatatgcacccacttggaggtatgtttcatgtattcagattttcatttcatcaatgaacaaggcgaatcaatagtggtag
aacttgatgatccaaatgcactgttaaagcacagatttgaaataatcgaggggagagacagaacaatggcctggacagtagtaaacagtatctgc
aacactactggagcagaaaaaccaaagtttctaccagatttgtatgattacaaggagaatagattcatcgaaattggagtgacaagaagagaagt
ccacatatattaccttgaaaaggccaataaaattaaatctgagaacacacacattcacatcttctcattcactggggaggaaatagccacaaagg
cagactacactctcgacgaggaaagcagggctaggattaaaaccaggctatttaccataagacaagaaatggccaacagaggcctctgggattcc
tttcgtcagtccgaaagaggcgaagaaacaattgaagaaaaatttgaaatctcaggaactatgcgtaggcttgccgaccaaagtctcccaccgaa

76
attctcctgccttgagaattttagagcctatgtggatggattcgaaccgaacggctgcattgagggcaagctttctcaaatgtccaaagaagtga
atgccaaaattgaaccttttctgaagacaacaccaagaccaatcaaacttcctaatggacctccttgttatcagcggtccaaattcctcctgatg
gatgctttgaaattgagcattgaagacccaagtcatgaaggagaagggattccattatatgatgcgatcaagtgcataaaaacattctttggatg
gaaagaaccttatatagtcaaaccacacgaaaagggaataaattcaaattacctgctgtcatggaagcaagtattgtcagaattgcaggacattg
aaaatgaggagaagatcccaaggactaaaaacatgaagaaaacgagtcaactaaagtgggctcttggtgaaaacatggcaccagagaaagtagac
tttgacaactgcagagacataagcgatttgaagcaatatgatagtgacgaacctgaattaaggtcactttcaagctggatacagaatgagttcaa
caaggcctgcgagctaactgattcaatctggatagagctcgatgaaattggagaggacgtagccccaattgagtacattgcaagcatgaggagga
attatttcacagcagaggtgtcccattgtagagccactgagtacataatgaagggggtatacattaatactgccctgctcaatgcatcctgtgca
gcaatggacgattttcaactaattcccatgataagcaagtgcagaactaaagagggaaggcgaaaaaccaatttatatggattcatcataaaggg
aagatctcatttaaggaatgacacagatgtggtaaactttgtgagcatggagttttctctcactgacccgagacttgagccacataaatgggaga
aatactgtgtccttgagataggagatatgttactaagaagtgccataggccaaatttcaaggcctatgttcttgtatgtgaggacaaacggaaca
tcaaaggtcaaaatgaaatggggaatggagatgagacgttgcctccttcagtcactccagcagatcgagagcatgattgaagccgagtcctcgat
taaagagaaagacatgaccaaagagttttttgagaataaatcagaagcatggcccattggggagtcccccaagggagtggaagaaggttccattg
ggaaagtctgtaggactctattggctaagtcagtgttcaatagcctgtatgcatcaccacaattggaaggattttcagcggagtcaagaaaactg
cttcttgttgttcaggctcttagggacaacctcgaacctgggacctttgatctcggggggctatatgaagcaattgaggagtgcctgattaatga
tccctgggttttgctcaatgcatcttggttcaactccttcctgacacatgcattaaaatag
>gi|GENSCAN_predicted_peptide_12|230_aa
MDSNTVSSFQVDCFLWHIRKQVVDQELSDAPFLDRLRRDQRSLRGRGNTLGLDIKAATHVGKQIVEKILKEESDEALKMTMVSTPASRYITDMTI
EELSRNWFMLMPKQKVEGPLCIRMDQAIMEKNIMLKANFSVIFDRLETIVLLRAFTEEGAIVGEISPLPSFPGHTIEDVKNAIGVLIGGLEWNDN
TVRVSKNLQRFAWRSSNENGGPPLTPKQKRKMARTARSKV
>gi|GENSCAN_predicted_CDS_12|693_bp
atggattccaacactgtgtcaagtttccaggtagattgctttctttggcatatccggaaacaagttgtagaccaagaactgagtgatgccccatt
ccttgatcggcttcgccgagatcagaggtccctaaggggaagaggcaatactctcggtctagacatcaaagcagccacccatgttggaaagcaaa
ttgtagaaaagattctgaaagaagaatctgatgaggcacttaaaatgaccatggtctccacacctgcttcgcgatacataactgacatgactatt
gaggaattgtcaagaaactggttcatgctaatgcccaagcagaaagtggaaggacctctttgcatcagaatggaccaggcaatcatggagaaaaa
catcatgttgaaagcgaatttcagtgtgatttttgaccgactagagaccatagtattactaagggctttcaccgaagagggagcaattgttggcg
aaatctcaccattgccttcttttccaggacatactattgaggatgtcaaaaatgcaattggggtcctcatcggaggacttgaatggaatgataac
acagttcgagtctctaaaaatctacagagattcgcttggagaagcagtaatgagaatgggggacctccacttactccaaaacagaaacggaaaat
ggcgagaacagctaggtcaaaagtttga
>gi|GENSCAN_predicted_peptide_13|498_aa
MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSAFDERRNKYLEEHPSAGKDPKKTGGP
IYRRVDGKWMRELVLYDKEEIRRIWRQANNGEDATAGLTHIMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMV
MELIRMVKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACAYGPAV
SSGYDFEKEGYSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGTKVSPRGKLSTRGVQIASNENMDNMGSSTLE
LRSGYWAIRTRSGGNTNQQRASAGQTSVQPTFSVQRNLPFEKSTIMAAFTGNTEGRTSDMRAEIIRMMEGAKPEEVSFRGRGVFELSDEKATNPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_13|1497_bp
atggcgtcccaaggcaccaaacggtcttatgaacagatggaaactgatggggatcgccagaatgcaactgagattagggcatccgtcgggaagat
gattgatggaattgggagattctacatccaaatgtgcactgaacttaaactcagtgatcatgaagggcggttgatccagaacagcttgacaatag
agaaaatggtgctctctgcttttgatgaaagaaggaataaatacctggaagaacaccccagcgcggggaaagatcccaagaaaactggggggccc
atatacaggagagtagatggaaaatggatgagggaactcgtcctttatgacaaagaagagataaggcgaatctggcgccaagccaacaatggtga
ggatgcgacagctggtctaactcacataatgatctggcattccaatttgaatgatgcaacataccagaggacaagagctcttgttcgaactggaa
tggatcccagaatgtgctctctgatgcagggctcgactctccctagaaggtccggagctgcaggtgctgcagtcaaaggaatcgggacaatggtg
atggaactgatcagaatggtcaaacgggggatcaacgatcgaaatttctggagaggtgagaatgggcggaaaacaagaagtgcttatgagagaat
gtgcaacattcttaaaggaaaatttcaaacagctgcacaaagagcaatggtggatcaagtgagagaaagtcggaacccaggaaatgctgagatcg
aagatctcatatttttggcaagatctgcattgatattgagagggtcagttgctcacaaatcttgcctacctgcctgtgcgtatggacctgcagta
tccagtgggtacgacttcgaaaaagagggatattccttggtgggaatagaccctttcaaactacttcaaaatagccaaatatacagcctaatcag
acctaacgagaatccagcacacaagagtcagctggtgtggatggcatgccattctgctgcatttgaagatttaagattgttaagcttcatcagag
ggacaaaagtatctccgcgggggaaactgtcaactagaggagtacaaattgcttcaaatgagaacatggataatatgggatcgagcactcttgaa
ctgagaagcgggtactgggccataaggaccaggagtggaggaaacactaatcaacagagggcctccgcaggccaaaccagtgtgcaacctacgtt
ttctgtacaaagaaacctcccatttgaaaagtcaaccatcatggcagcattcactggaaatacggagggaaggacttcagacatgagggcagaaa
tcataagaatgatggaaggtgcaaaaccagaagaagtgtcattccgggggaggggagttttcgagctctcagacgagaaggcaacgaacccgatc
gtgccctcttttgatatgagtaatgaaggatcttatttcttcggagacaatgcagaagagtacgacaattaa
>gi|GENSCAN_predicted_peptide_14|1223_aa
MNPNQKIITIGSVSLTISTICFFMQIAILITTVTLHFKQYEFNSPPNNQVMLCEPTIIERNITEIVYLTNTTIEKEMCPKLAEYRNWSKPQCDIT
GFAPFSKDNSIRLSAGGDIWVTREPYVSCDPDKCYQFALGQGTTLNNVHSNDTVHDRTPYRTLLMNELGVPFHLGTKQVCIAWSSSSCHDGKAWL
HVCVTGDDKNATASFIYNGRLVDSIVSWSKKILRTQESECVCINGTCTVVMTDGSASGKADTKILFIEEGKIIHTSTLSGSAQHVEECSCYPRYP
GVRCVCRDNWKGSNRPIVDINIKDYSIVSSYVCSGLVGDTPRKNDSSSSSHCLDPNNEEGGHGVKGWAFDDGNDVWMGRTISEKLRSGYETFKVI
EGWSKPNSKLQINRQVIVDRGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNN
MDKAVKLYRKLKREITFHGAKEIALSYSAVPNGTIVKTITNDQIEVTNATELVQSSSTGGICDSPHQILDGENCTLIDALLGDPQCDGFQNKKWD
LFVERSKAYSNCYPYDVPDYASLRSLVASSGTLEFNNESFNWTGVTQNGTSSACKRRSNNSFFSRLNWLTHLKFKYPALNVTMPNNEKFDKLYIW
GVHHPGTDNDQISLYAQASGRITVSTKRSQQTVIPSIGSRPRIRDVPSRISIYWTIVKPGDILLINSTGNLIAPRGYFKIRSGKSSIMRSDAPIG
KCNSECITPNGSIPNDKPFQNVNRITYGACPRYVKQNTLKLATGMRNVPEKQTRGIFGAIAGFIENGWEGMVDGWYGFRHQNSEGTGQAADLKST
QAAINQINGKLNRLIGKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFERTKKQLRENAEDMGNG
CFKIYHKCDNACIGSIRNGTYDHDVYRDEALNNRFQIKGPLKAEIAQRLEDVFAGKNTDLEALMEWLKTRPILSPLTKGILGFVFTLTVPSERGL
QRRRFVQNALNGNGDPNNMDRAVKLYKKLKREITFHGAKEVALSYSTGALASCMGLIYNRMGTVTTEVAFGLVCATCEQIADSQHRSHRQMATTT
NPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEVASQARQMVQAMRTIGTHPSSSAGLKDNLLENLQAYQKRMGVQMQRFK
>gi|GENSCAN_predicted_CDS_14|3672_bp
atgaatccaaatcaaaagataataacgattggctctgtttctctcaccatttccacaatatgcttcttcatgcaaattgccatcctgataaccac
tgtaacattgcatttcaagcaatatgaattcaactcccccccaaacaaccaagtgatgctgtgtgaaccaacaataatagaaagaaacataacag
agatagtgtatctgaccaacaccaccatagagaaggaaatgtgccccaaactagcagaatacagaaattggtcaaagccgcaatgtgacattaca
ggatttgcacctttttctaaggacaattcgattaggctttccgctggtggggacatctgggtgacaagagaaccttatgtgtcatgcgaccctga
caagtgttaccaatttgcccttggacagggaacaacactaaacaacgtgcattcaaatgacacagtacatgataggaccccttatcggaccctat
tgatgaatgaattaggtgttccatttcatctggggaccaagcaagtgtgcatagcatggtccagctcaagttgtcacgatggaaaagcatggctg
catgtttgtgtaacgggggatgataaaaatgcaactgctagcttcatttacaatgggaggcttgtagatagtattgtttcatggtccaaaaaaat
cctcaggacccaggagtcagaatgcgtttgtatcaatggaacttgtacagtagtaatgactgatgggagtgcttcaggaaaagctgatactaaaa

77
tactattcattgaggaggggaaaatcattcatactagcacattgtcaggaagtgctcagcatgtcgaggagtgctcctgctatcctcgatatcct
ggtgtcagatgtgtctgcagagacaactggaaaggctccaataggcccatcgtagatataaacataaaggattatagcattgtttccagttatgt
gtgctcagggcttgttggagacacacccagaaaaaacgacagctccagcagtagccattgcttggatcctaacaatgaagaaggtggtcatggag
tgaaaggctgggcctttgatgatggaaatgacgtgtggatgggaagaacgatcagcgagaagttacgctcaggatatgaaaccttcaaagtcatt
gaaggctggtccaaacctaattccaaattgcagataaataggcaagtcatagttgacagaggccccctcaaagccgagatcgcgcagagacttga
agatgtctttgctgggaaaaacacagatcttgaggctctcatggaatggctaaagacaagaccaattctgtcacctctgactaaggggattttgg
ggtttgtgttcacgctcaccgtgcccagtgagcgaggactgcagcgtagacgctttgtccaaaatgccctcaatgggaatggagatccaaataac
atggacaaagcagttaaactgtataggaaacttaagagggagataacgttccatggggccaaagaaatagctctcagttattctgctgtaccaaa
cggaacgatagtgaaaacaatcacgaatgaccaaattgaagtcactaatgctactgaactggttcagagttcctcaacaggtggaatatgcgaca
gtcctcatcagatccttgatggagaaaactgcacactaatagatgctctattgggagaccctcagtgtgatggcttccaaaataagaaatgggac
ctttttgttgaacgcagcaaagcctacagcaactgttacccttatgatgtgccggattatgcctcccttaggtcactagttgcctcatccggcac
actggagtttaacaatgaaagcttcaattggactggagtcactcaaaatggaacaagctctgcttgcaaaaggagatctaataacagtttcttta
gtagattgaattggttgacccacttaaaattcaaatacccagcattgaacgtgactatgccaaacaatgaaaaatttgacaaactgtacatttgg
ggggttcaccacccgggtacggacaatgaccaaatcagcctatatgctcaagcatcaggaagaatcacagtctctaccaaaagaagccaacaaac
cgtaatcccgagtatcggatctagacccaggataagggatgtccccagcagaataagcatctattggacaatagtaaaaccgggagacatacttt
tgattaacagcacagggaatctaattgctcctcggggttacttcaaaatacgaagtgggaaaagctcaataatgagatcagatgcacccattggc
aaatgcaattctgaatgcatcactccaaatggaagcattcccaatgacaaaccatttcaaaatgtaaacaggatcacatatggggcctgtcccag
atatgttaagcaaaacactctgaaattggcaacagggatgcgaaatgtaccagagaaacaaactagaggcatatttggcgcaatcgcgggtttca
tagaaaatggttgggagggaatggtagacggttggtacggtttcaggcatcaaaattctgagggaacaggacaagcagcagatctcaaaagcact
caagcagcaatcaaccaaatcaatgggaagctgaataggttgatcgggaaaacaaacgagaaattccatcagattgaaaaagaattctcagaagt
agaagggagaattcaggacctcgagaaatatgttgaggacactaaaatagatctctggtcatacaacgcggagcttcttgtggccctggagaacc
aacatacaattgatctaactgactcagaaatgaacaaactgtttgaaagaacaaagaagcaactgagggaaaatgctgaggatatgggcaatggt
tgtttcaaaatataccacaaatgtgacaatgcctgcatagggtcaatcagaaatggaacttatgaccatgatgtatacagagatgaagcattaaa
caaccggttccagatcaaaggccccctcaaagccgagatcgcgcagagacttgaggatgtctttgcaggaaagaacaccgatctcgaggctctca
tggaatggctaaagacaagaccaatcctgtcacctctgactaaagggattttaggatttgtgttcacgctcaccgtgcccagtgagcgaggactg
cagcgtagacgctttgtccagaatgccttaaatggaaatggagatccaaacaatatggatagggcagttaagctatacaagaagctgaaaagaga
aataacattccatggggctaaggaggtcgcactcagctactcaaccggtgcacttgccagttgtatgggtctcatatacaacaggatgggaacgg
tgaccacagaagtggcttttggcctagtgtgtgccacttgtgagcagattgcagattcacagcatcggtctcacagacagatggcaactaccacc
aacccactaatcaggcatgagaacagaatggtgctggccagcactacagctaaggctatggagcagatggctggatcgagtgagcaggcagcgga
agccatggaggttgctagtcaggctaggcagatggtgcaggcaatgaggacaattgggactcatcctagctccagtgccggtctgaaagataatc
ttcttgaaaatttgcaggcctaccaaaaacgaatgggagtgcaaatgcagcgattcaagtga
>gi|GENSCAN_predicted_peptide_15|1320_aa
MEKIVLLLAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLNGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYI
VEKASPANDLCYPGDFNDYEELKHLLSRTNHFEKIQIIPKSSWSNHDASSGVSSACPYHGRSSFFRNVVWLIKKNSAYPTIKRSYNNTNQEDLLV
LWGIHHPNDAAEQTKLYQNPTTYISVGTSTLNQRLVPEIATRPKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGDSAIMKSEL
EYGNCNTKCQTPMGAINSSMPFHNIHPLTIGECPKYVKSNRLVLATGLRNTPQRERRRKKRGLFGAIAGFIEGGWQGMVDGWYGYHHSNEQGSGY
AADKESTQKAIDGVTNKVNSIIDKMNTQFEAVGREFNNLERRIENLNKQMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRLQLRDN
AKELGNGCFEFYHKCDNECMESVKNGTYDYPQYSEEARLNREEISGVKLESMGTYQILSIYSTVASSLALAIMGALLNDKHSNGTVKDRSPHRTL
MSCPVGEAPSPYNSRFESVAWSASACHDGTSWLTIGISGPDNGAVAVLKYNGIITDTIKSWRNNILRTQESECACVNGSCFTVMTDGPSNGQASY
KIFKMEKGKVVKSVELNAPNYHYEECSCYPDAGEITCVCRDNWHGSNRPWVSFNQNLEYQIGYICSGVFGDNPRPNDGTGSCGPVSPNGAYGVKG
FSFKYGNGVWIGRTKSTNSRSGFEMIWDPNGWTGTDSSFSVKQDIVAITDWVDNHSLSDINIMASQGTKRSYEQMETGGERQNATEIRASVGRMV
GGIGRFYIQMCTELKLSDYEGRLIQNSITIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYRRRDGKWVRELILYDKEEIRRIWRQANNGED
ATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMVMELIRMIKRGINDRNFWRGENGRRTRIAYERMC
NILKGKFQTAAQRAMMDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACVYGLAVASGYDFEREGYSLVGIDPFRLLQNSQVFSLIRP
NENPAHKSQLVWMACHSAAFEDLRVSSFIRGTRVAPRGQLSTRGVQIASNENMETMDSSTLELRSRYWAIRTRSGGNTNQQRASAGQISVQPTFS
VQRNLPFERATIMAAFTGNTEGRTS
DMRTEIIRMMESSRPEDVSFQGRGVFELSDEKATNPIVPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_15|3963_bp
atggagaaaatagtgcttcttcttgcaatagtcagtcttgtcaaaagtgatcagatttgcattggttaccatgcaaacaactcgacagagcaggt
tgacacaataatggaaaagaacgttactgttacacatgcccaagacatactggaaaagacacacaatgggaagctctgcgatctaaatggagtga
agcctctcattttgagagattgtagtgtagctggatggctcctcggaaac
cctatgtgtgacgaattcatcaatgtgccggaatggtcttacatagtggagaaggccagtccagccaatgacctctgttacccaggggatttcaa
cgactatgaagaactgaaacacctattgagcagaacaaaccattttgagaaaattcagatcatccccaaaagttcttggtccaatcatgatgcct
catcaggggtgagctcagcatgtccataccatgggaggtcctcctttttcagaaatgtggtatggcttatcaaaaagaacagtgcatacccaaca
ataaagaggagctacaataataccaaccaagaagatcttttagtactgtgggggattcaccatcctaatgatgcggcagagcagacaaagctcta
tcaaaacccaaccacttacatttccgttggaacatcaacactgaaccagagattggttccagaaatagctactagacccaaagtaaacgggcaaa
gtggaagaatggagttcttctggacaattttaaagccgaatgatgccatcaatttcgagagtaatggaaatttcattgctccagaatatgcatac
aaaattgtcaagaaaggggactcagcaattatgaaaagtgaattggaatatggtaactgcaacaccaagtgtcaaactccaatgggggcgataaa
ctctagtatgccattccacaacatacaccccctcaccatcggggaatgccccaaatatgtgaaatcaaacagattagtccttgcgactggactca
gaaatacccctcagagagagagaagaagaaaaaagagaggactatttggagctatagcaggttttatagagggaggatggcagggaatggtagat
ggttggtatgggtaccaccatagcaatgagcaggggagtggatacgctgcagacaaagaatccactcaaaaggcaatagatggagtcaccaataa
ggtcaactcgatcattgacaaaatgaacactcagtttgaggccgttggaagggaatttaataacttggaaaggaggatagagaatttaaacaagc
agatggaagacggattcctagatgtctggacttataatgctgaacttctggttctcatggaaaatgagagaactctagactttcatgactcaaat
gtcaagaacctttatgacaaggtccgactacagcttagggataatgcaaaggagctgggtaatggttgtttcgagttctatcacaaatgtgataa
tgaatgtatggaaagtgtaaaaaacggaacgtatgactacccgcagtattcagaagaagcaagactaaacagagaggaaataagtggagtaaaat
tggaatcaatgggaacttaccaaatactgtcaatttattcaacagtggcgagttccctagcactggcaatcatgggagccttgctgaatgacaag
cactccaatgggaccgtcaaagacagaagccctcacagaacattgatgagttgtcctgtgggtgaggctccctccccatataactcaaggtttga
gtctgttgcttggtcggcaagtgcttgccatgatggcaccagttggttgacaattggaatttctggcccagacaatggggctgtggctgtattga
aatacaacggcataataacagacactatcaagagttggaggaacaacatactgagaactcaagagtctgaatgtgcatgtgtaaatggctcttgc
tttactgtaatgactgacggaccaagtaatgggcaggcctcatataagatcttcaaaatggaaaaagggaaagtagttaaatcagtcgaattgaa
tgcccctaattatcactatgaggagtgctcctgttatcctgatgctggcgaaatcacatgtgtgtgcagggataattggcatggctcaaatcggc
catgggtatctttcaatcaaaatttggagtatcaaataggatatatatgcagtggagttttcggagacaatccacgccccaatgatggaacaggc
agttgtggtccggtgtcccctaacggggcatatggagtaaaagggttttcatttaaatacggcaatggtgtttggatcgggagaaccaaaagcac
taattccaggagcggctttgaaatgatttgggatccaaatgggtggactggaacggacagtagcttctcggtgaaacaagatatcgtagcaataa
ctgattgggtagataatcactcactgagtgacatcaacatcatggcgtctcagggcaccaaacgatcttatgaacagatggaaactggtggagaa
cgccagaatgctactgagatcagagcatctgttggaagaatggttggtggaattgggaggttttatatacagatgtgcactgaactcaaactcag

78
cgactatgaaggaaggctgattcagaacagcataacaatagagagaatggttctctctgcatttgatgaaaggaggaacaaatacctggaagaac
atcccagtgcggggaaggacccaaagaaaactggaggtccaatctaccgaagaagagacggaaaatgggtgagagagctgattctgtatgacaaa
gaggagatcaggagaatttggcgtcaagcgaacaatggagaagatgcaactgctggtctcactcacatgatgatctggcattccaatctaaatga
tgccacataccagagaacaagagctctcgtgcgtactgggatggaccctagaatgtgctctctgatgcaaggatcaactctcccgaggagatctg
gagctgctggtgcggcagtaaagggagtcggaacgatggtgatggaactaattcggatgataaagcgagggattaacgatcggaatttctggaga
ggtgaaaatgggcgaagaacaagaattgcatatgagagaatgtgcaacatcctcaaagggaaattccaaacagcagcacaaagagcaatgatgga
tcaggtacgggaaagcagaaatcctgggaatgctgagattgaagatctcatatttctggcacggtctgcactcatcctgagaggatcagtggccc
acaagtcctgcttgcctgcttgtgtgtacgggcttgccgtggccagtggatatgactttgagagagaagggtactctctggtcgggattgatcct
ttccgtctgctgcaaaacagccaggtctttagtctaattagaccaaatgagaatccagcacataaaagtcaattggtgtggatggcatgccattc
tgcagcatttgaagatctgagagtctcaagcttcatcagagggacaagagtggccccaaggggacaactatctactagaggagttcaaattgctt
caaatgagaacatggaaacaatggactccagcactcttgaactgagaagcagatattgggctataaggaccaggagtggaggaaacaccaaccag
cagagagcatctgcaggacaaatcagtgtgcagcctactttctcggtacagagaaatcttcccttcgaaagagcgaccattatggcggcattcac
agggaatacagagggcagaacatctgacatgaggactgaaatcataaggatgatggaaagctccagaccagaagatgtgtctttccaggggcggg
gagtcttcgagctctcggacgaaaaggcaacgaacccgatcgtgccttcctttgacatgagtaatgaaggatcttatttcttcggagacaatgca
gaggaatatgacaattga
>gi|GENSCAN_predicted_peptide_16|716_aa
MEDFVRQCFNPMIVELAEKAMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESTIIESGDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGVEKPKFLPDLYDYKENRFIEIGVTRREVHTYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDS
FRQSERGEETVEERFEITGTMCRLADQSLPPNFSSLEKFRAYVDGFEPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMKTFFGWKEPNIVKPHEKGINPNYLLAWKQVLAELQDIENEEKIPKTKNMRKTSQLKWALGENMAPEKVD
FEDCKDVSDLRQYDSDEPKPRSLASWIQSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFLIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHRWEKYCVLRIGDMLLRTEIGQVSRPMFLYVRTNGT
SKIKMKWGMEMRRCPFQSLQQIESMIEAESSVKEKDMTKEFFENKSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALR
>gi|GENSCAN_predicted_CDS_16|2151_bp
atggaagactttgtgcgacaatgcttcaatccaatgattgtcgagcttgcggaaaaggcaatgaaagaatatggggaagatccgaaaatcgaaac
gaacaaatttgccgcaatatgcacgcacttagaagtctgtttcatgtattcagatttccactttattgatgaacggggcgaatcaacaattatag
aatctggcgatcccaatgcattattgaaacaccggtttgaaataatcgaagggagggaccgaacaatggcctggacagtggtgaatagtatctgc
aacaccacaggagttgagaagcctaaatttctcccagatttgtatgactacaaggagaaccgatttattgaaattggagtgacacggagggaagt
tcacacatactatctagaaaaagccaacaagataaaatctgagaagacacacattcacatattctcattcactggagaggaaatggccaccaaag
cggactacacccttgatgaagaaagcagggcccgaatcaaaaccaggctgttcactataaggcaggaaatggccagtaggggtttatgggattcc
tttcgtcagtccgagagaggcgaagagacagttgaagaaagatttgaaatcacagggactatgtgcaggcttgccgaccaaagtctcccacctaa
tttctccagccttgaaaaatttagagcctatgtggatggattcgaaccgaacggctgcattgagggcaagctttctcaaatgtcgaaagaagtaa
acgccagaattgagccatttctgaagacaacaccacgccctcttagattacctgatgggcctccctgctctcagcggtcgaagtttttgctgatg
gatgcccttaaattaagcatcgaagacccgagtcatgagggggaggggataccgctatatgatgcaatcaaatgcatgaaaacatttttcggctg
gaaagagcccaacattgtaaaaccacatgaaaaaggcataaaccccaattacctcctggcttggaagcaggtgctggcagagctccaagatattg
aaaacgaggagaaaattccaaagacaaagaacatgaggaaaacaagccaattgaagtgggcacttggtgagaatatggcaccagagaaagtagac
tttgaggattgcaaagatgttagcgatctaaggcagtatgacagtgatgaaccaaagcctagatcactagcaagctggatccagagtgaattcaa
caaggcatgcgaattgacagattcaagttggattgaacttgatgaaataggggaagacgttgctccaattgagcacattgcaagtatgagaagga
actatttcacagcggaagtatcccattgcagggctactgaatacataatgaagggagtgtacataaacacagctttgttgaatgcatcctgtgca
gccatggatgacttccaactgatcccaatgataagcaaatgcagaaccaaagaaggaagacggaaaactaacctgtatggattccttataaaagg
aagatcccatttgagaaatgacaccgatgtggtaaactttgtgagtatggaattctctcttactgatccgaggctggagccacacagatgggaaa
agtactgcgttcttcggataggagacatgctcttacggactgaaataggccaagtgtcaaggcccatgtttctttatgtgagaaccaatggaacc
tccaagatcaagatgaaatggggcatggaaatgaggcgatgcccttttcaatcccttcaacagattgagagcatgattgaggccgagtcttctgt
caaagaaaaagacatgactaaagaattctttgaaaacaaatcagaaacatggccaattggagaatcacccaagggagtggaggaaggctccatcg
ggaaggtgtgcagaaccttactggctaaatctgttttcaacagtctatatgcatctccacaactcgaggggttttcagctgaatcaagaaaattg
cttctcattgttcaggcacttagggacaacctggaacctggaaccttcgatcttggggggctatatgaagcaattgaggagtgcctgattaatga
tccctgggttttgcttaatgcatcttggttcaactccttcctcacacatgcactaagatag
>gi|GENSCAN_predicted_peptide_17|718_aa
MDTVNRTHQYSEKGKWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCLETMEVVQQTRVDKLTQGRQTYDWTLK
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKGEMEIITHFQRKRRVRDNMTKKMVTQRTIGKKKQRLNKRSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSFTITGDNTKWNENQNPRMFLAMITY
ITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNESTRKKIEKIRPLLIDGTASLSPGMMMGMFNMLSTVL
GVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIEAGVDRFYRTCKLVGINMTKKKSYINRTGTCEFTSFFYRYGFVANFSMELPSFGV
SGINESADMSIGVTVIKNNMMDNDLGPATAQMALQLFIKDYRYPYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGPNPYNIRNLHIPEA
GLKWELMDEDYQGRLCNPLNPFVSHKEIESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCCNLFEKFFPSSS
YRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFAEIMKICSTIEELGRQK
>gi|GENSCAN_predicted_CDS_17|2157_bp
atggacacagtcaacagaacacatcaatattcagaaaaggggaaatggacaacgaacacagagactggagcaccccaactcaatccgattgatgg
accactacctgaggataatgagccgagtgggtatgcacaaacagattgtgtattggaagcaatggctttccttgaagaatcccacccagggatct
ttgaaaactcgtgtcttgaaacgatggaagttgttcagcaaacaagagtggataagctgacccaaggtcgccaaacctatgactggacattgaaa
agaaaccagccggctgcaaccgctttggccaacactatagaggtcttcagatcgaatggtctaacagccaatgaatcgggaaggctaatagattt
cctcaaagacgtgatggaatcaatggataagggagaaatggaaataataacacatttccagagaaagagaagagtgagggacaacatgaccaaga
aaatggtcacacaaagaacaatagggaagaaaaaacaaaggctgaacaaaaggagctacctaataagagcactgacactgaacacaatgacaaaa
gacgcagaaagaggcaaattgaagaggcgggcaattgcaacacccgggatgcaaatcagaggattcgtgtactttgtcgaaacactagcgaggag
tatctgtgagaaacttgagcaatctggactccccgtcggagggaatgaaaagaaggctaaattggcaaatgtcgtgaggaagatgatgactaact
cacaagatacagagctctcttttacaattactggagacaacaccaaatggaatgagaatcagaaccctcggatgtttctagcaatgataacatac
atcacaaggaaccaacctgaatggtttagaaatgtcttaagcattgctcctataatgttctcaaacaagatggcaagattagggaaaggatacat
gttcgaaagtaagagcatgaagctacggacacaaataccagcagaaatgcttgcaagcattgacttgaaatacttcaacgaatcaacgagaaaga
aaatcgagaaaataagacctctactaatagatggcacagcctcattgagtcctggaatgatgatgggcatgttcaatatgctgagtacagtctta
ggagtttcaatcctgaatcttgggcagaagaggtacaccaaaaccacatactggtgggacggactccaatcctctgatgatttcgctctcatagt
gaatgcaccaaatcatgagggaatagaagcaggggtggataggttctataggacttgcaaactagttggaatcaatatgaccaagaagaagtctt
acataaatcggacaggaacatgtgaattcacaagcttcttctaccgctatgggttcgtagccaacttcagtatggagctgcccagctttggagtg
tctgggattaatgaatcggctgacatgagcattggtgttacagtgataaagaacaatatgatggacaacgaccttggaccagcaacagctcagat
ggctcttcagctattcattaaggactacagatacccataccgatgccacaggggggatacacaaatccaaacgaggagatcattcgagctgaaga
agctgtgggagcagacccgctcaaaggcaggactgttggtttcagatggaggaccaaacccatacaatatccggaatctccacattccggaggct

79
ggcttgaagtgggaattgatggatgaagactaccagggcagactgtgtaatcctctgaacccgtttgttagtcataaggaaattgagtctgtcaa
caatgctgtggtaatgccagctcatggcccagccaagagcatggaatatgatgcagttgcgactacacattcatggattcccaagaggaatcgtt
ccattctcaacaccagccaaagggggattcttgaggatgaacagatgtatcagaagtgctgcaatctattcgagaaattcttccctagcagttca
tatcggaggccagttggaatttccagcatggtggaggccatggtgtctagggcccgaattgatgcacgaattgacttcgagtctggaaggattaa
gaaagaagagtttgctgagatcatgaagatctgttccaccattgaagagctcggacggcaaaaatag
>gi|GENSCAN_predicted_peptide_18|759_aa
MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLA
VTWWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCH
STQIGGIRMVDILRQNPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRR
LIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSAEMSL
RGVRVSKMGVDEYSSTERVVVSIDRFLRVRDQQGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQDPTM
LYNKMEFESFQSLVPKAARSQYSGFVRTLFQQMRDVLGTFDTVQIIKLLPFAAAPPEPSRMQFSSLTVNVRGSGMRILVRGNSPVFNYNKATKRL
TVLGKDAGALTEDPDEGTAGVESAVLRGFLILGREDKRYGPALSINELSNLAKGEKANVLIMQGDVVLVMKRKRDFSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_18|2280_bp
atggaaagaataaaagaactaagagatctaatgtcgcagtcccgcactcgcgagatactaacaaaaaccactgtggatcatatggccataatcaa
gaaatacacatcaggaagacaagagaagaaccctgctctcagaatgaaatggatgatggcaatgaaatatccaatcacagcagacaagagaataa
tggagatgattcctgaaaggaatgagcaaggacaaacgctttggagcaagacaaatgatgctgggtcggacagagtgatggtgtctcccctagct
gtaacttggtggaacaggaatgggccgacaacaagtacagtccattatccaaaggtttacaaaacatactttgagaaggttgaaaggttaaaaca
tggaaccttcggtcccgttcatttccgaaaccaagttaaaatacgtcgccgggtggatataaacccgggccatgcagatctcagtgctaaagaag
cacaagatgttatcatggaggtcgttttcccaaatgaagtgggagctagaatattgacatcagagtcgcaattgacaataacaaaagagaagaaa
gaagagctccaggattgtaaaattgctcctttaatggtggcatacatgttggaaagagaactggtccgcaaaaccagatttctaccggtagcagg
cggaacaagcagtgtgtacattgaggtattgcatttgactcaagggacctgttgggaacagatgtacactcccggcggagaagtaagaaatgatg
atgttgaccagagtttgatcatcgctgccagaaacattgttaggagagcaacagtatcagcggacccactggcatcactcttggagatgtgtcac
agcacacaaattgggggaataaggatggtggacatccttaggcaaaacccaactgaggagcaagctgtggatatatgcaaagcagcaatgggttt
gaggatcagttcatcctttagctttggaggcttcactttcaaaagaacaaatggatcatccgtcaagaaggaagaggaagtgcttacaggcaacc
tccaaacattgaaaataaaagtacatgaggggtatgaagaattcacaatggttgggcggagagcaacagctatcctgaggaaagcaactagaagg
ctgattcagttgatagtaagtggaagagatgaacaatcaatcgctgaagcgatcattgtagcaatggtgttctcacaggaggattgcatgataaa
ggcagtccgaggcgatctgaatttcgtgaacagagcaaaccaaagattgaaccccatgcatcaactcctgaggcacttccaaaaagatgcaaaag
tgctgtttcagaactggggaattgaacctattgacaatgtcatggggatgatcggaatattacctgacatgactccaagcgcagagatgtcactg
agaggagtgagagttagtaagatgggagtagatgaatattccagcacggagagagtggtggtgagtattgaccgtttcttgagggtccgagatca
gcaggggaacgtactcttatctcctgaagaggttagtgaaacacagggaacagagaagttgacaataacatattcatcctcaatgatgtgggaaa
tcaacggtcctgagtcagtgcttgttaacacttatcaatggatcatcaggaattgggagactgtaaagattcaatggtctcaagatcccacaatg
ctgtacaataagatggagtttgaatcgttccaatccttggtgccaaaggctgccagaagccaatatagtggatttgtgagaacactattccaaca
gatgcgtgatgttttggggacatttgatactgtccaaataatcaagctgctaccatttgcagcagccccaccggagccgagcagaatgcagtttt
cttctctaactgtgaatgtgagaggctcaggaatgagaatactcgtgaggggtaactcccccgtgttcaactacaacaaggcaaccaaaaggctt
acagtcctcggaaaggacgcaggtgcattaacagaagatccagacgagggaacagccggggtggaatctgcagtattgaggggattcctaattct
aggcagagaggacaaaagatatggacccgcattgagcatcaatgaactgagcaatcttgcaaaaggggagaaggctaatgtattgataatgcaag
gagacgtggtgttggtaatgaaacggaaacgggactttagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattag
>gi|GENSCAN_predicted_peptide_19|716_aa
MEDFVRQCFNPMIVELAEKTMKEYGEDPKIETNKFAAICTHLEVCFMYSDFHFIDERGESIIVESGDPNALLKHRFEIIEGRDRAMAWTVVNSIC
NTTGVDKPKFLPDLYDYKENRFTEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDS
FRQSERGEETIEERFEITGTMRRLADQSLPPNFSSLENFRAYVDGFKPNGCIEGKLSQMSKEVNARIEPFLKTTPRPLRLPDGPPCSQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMKTFFGWREPNIIKPHEKGINPNYLLAWKQVLAELQDIENEDKIPKTKNMKKTSQLMWALGENMAPEKLD
FEDCKDIGDLKQYQSDEPELRSIASWIQSEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTAEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEVGEMLLRTAIGQVSRPMFLYVRTNGT
SKIKMKWGMEMRRCLLQSLQQIESMIEAESSIKEKDMTKEFFENRSETWPIGESPKGVEEGSIGKVCRTLLAKSVFNSLYSSPQLEGFSAESRKL
LLIVQALRDNLEPGTFDLEGLYGAIEECLINDPWVLLNASWFNSFLTHALK
>gi|GENSCAN_predicted_CDS_19|2151_bp
atggaagactttgtgcgacagtgcttcaatccaatgattgtcgagcttgcggaaaagacaatgaaggaatatggggaagacccgaaaattgaaac
aaataagttcgctgcaatatgcacacacttagaagtctgcttcatgtattcagacttccatttcattgacgaacgaggcgaatcaataattgtgg
aatctggtgatccaaatgcattgttgaagcacaggtttgaaataattgaaggaagagaccgagcaatggcctggacagtggtgaatagcatctgc
aacacaacaggagtcgataaacccaaatttcttccggatctatacgactacaaggaaaaccgattcactgaaattggtgtgacacggagggaagt
tcacatatattacttagaaaaagctaacaagataaaatccgagaaaacacatatccacatcttttcattcactggagaagaaatggccactaaag
ctgactacacccttgatgaagagagcagggcaagaataaaaaccagactattcaccataagacaggaaatggcaagcaggggtctatgggattcc
tttcgtcagtccgagagaggcgaagagacaattgaagaaagatttgaaatcacagggaccatgcgtaggcttgccgaccaaagtctcccacctaa
cttctccagccttgaaaactttagagcctatgtggatggattcaaaccgaacggctgcattgagggcaagctttctcaaatgtcgaaagaagtga
acgccagaattgagccatttctgaagacaacaccacgtcccctcagattgcctgatggacctccctgctcccagcggtcgaaattcttgctgatg
gatgctctgaaattaagcattgaggacccgagccatgagggggaggggataccgctatatgatgcgataaaatgcatgaaaacattcttcggctg
gagagagcccaacatcatcaagccacacgagaagggcataaatcccaattatcttctggcttggaagcaggtgctggcagaactccaggatattg
aaaatgaggataaaatcccaaaaacaaagaacatgaagaaaacaagccaattaatgtgggcactcggggagaatatggcaccggaaaaattggac
tttgaggactgcaaagatattggcgatctgaaacagtatcaaagtgatgagccagagctcagatcgatagcaagctggatccagagtgagttcaa
caaggcatgtgaattgaccgattcgagctggatagaactcgatgagataggggaagatgttgccccaattgagcacattgcaagcatgagaagga
actacttcacagcggaagtgtctcattgcagggccactgagtacataatgaagggggtttacataaatacagctttgctcaatgcatcttgtgca
gccatggatgacttccaactgattccaatgataagcaaatgcagaacaaaagaaggaagaaggaagacaaacctgtatgggttcattataaaagg
aaggtcccatttgagaaatgatactgacgtggtgaactttgtgagtatggaattctcccttactgacccaaggctggagccacacaaatgggaaa
agtactgtgttcttgaagtaggggaaatgctcttgcggactgcaataggccaggtgtcaaggcccatgttcctgtatgtgagaactaacggaacc
tccaaaattaagatgaaatgggggatggaaatgagacgctgccttcttcaatctcttcaacagattgagagcatgatcgaggctgagtcttctat
caaagagaaagacatgaccaaagaattctttgaaaacagatcggagacatggccaattggagagtcacctaagggagtggaggaaggctcaatcg
ggaaggtgtgcagaaccttactagcaaaatctgtgttcaacagcctatattcatctccacaactcgaaggattttcagctgaatcgagaaaacta
ctactcattgttcaagcacttagggacaacctggaacctggaacctttgatcttgaagggctatatggagcaattgaggagtgcctgattaatga
tccctgggttttgcttaatgcatcttggttcaactccttcctcacacatgcactaaaatag
>gi|GENSCAN_predicted_peptide_20|719_aa
MDTVNRTHQYSEKGRWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGLFENSCLETMEVVQQTRVDKLTQGRQTYDWTLN
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMDKEEMEITTHFQRKRRVRDNMTKKMVTQRTIGKKKQKLTKKSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVHFVEALARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSFTVTGDNTKWNENQNPRIFLAMITY

80
ITRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLANIDLKYFNESTRKKIEKIRPLLIEGTASLSPGMMMGMFNMLSTVL
GVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLVGINMSKKKSYINRTGTFEFTSFFYRYGFVANFSMELPSFGV
SGINESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFELKKLWEQTRSKAGLLVSDGGPNLYNIRNLHIPEV
CLKWELMDEDYQGRLCNPLNPFVSHKEVESVNNAVVMPAHGPAKSMEYDAVATTHSWIPKRNRSILNTSQRGILEDEQMYQKCCTLFEKFFPSSS
YRRPVGISSMMEAMVSRARIDARIDFESGRIKKEEFAEILKICSTIEELGRQGK
>gi|GENSCAN_predicted_CDS_20|2160_bp
atggacacagtcaacagaacacatcaatattcagaaaaagggaggtggacaacaaacacagagaccggagcaccccaactcaaccctattgatgg
accattacctgaagacaatgagccgagcgggtatgcacaaacagattgtgtattggaagcaatggctttccttgaagaatcccacccaggactct
ttgaaaactcatgtcttgaaacgatggaagttgtccagcaaacgagagtggataagctgacccaaggtcgccagacttatgactggacattgaat
agaaaccagccggctgcaactgctttggccaacaccatagaagtattcagatcgaacggtctaacagccaatgagtcaggaaggttaatagattt
cctcaaggacgtaatggaatcaatggataaggaagaaatggaaataacaacacatttccagagaaagagaagagtgagggacaacatgaccaaga
aaatggtcacacaaagaacaatagggaagaagaagcaaaagctgacaaaaaagagctacctaataagagcactgacactgaacacaatgacaaaa
gatgctgaaaggggaaaattgaaaagacgagcgattgcaacacccggaatgcaaatcagaggattcgtgcactttgtcgaagcactagcaaggag
catctgtgaaaaacttgagcaatctggactccccgttggagggaatgagaagaaggctaaattggcaaatgttgtgagaaagatgatgactaact
cacaagacacagagctctcctttacagttaccggagacaacaccaaatggaatgagaatcagaatcctcgaatatttctagcaatgataacatac
atcacaaggaaccaacctgaatggtttagaaatgtcttgagcattgcccctataatgttctcaaataaaatggcgaggttaggaaaaggatacat
gttcgagagtaagagcatgaagctacggacacaaataccagcagaaatgcttgcaaacattgacttgaaatacttcaacgaatcgacgagaaaga
aaattgagaaaataagacctctactaatagagggcacagcctcattgagtccagggatgatgatgggcatgtttaatatgctaagtacggtctta
ggagtctcaatcttaaatcttgggcagaagaggtacaccaaaaccacatactggtgggatgggctccaatcctctgatgatttcgctctcatagt
gaatgcaccaaatcatgagggaatacaagcaggagtggatagattctataggacttgcaagctagttggaatcaacatgagcaaaaagaagtctt
acataaatcggacaggaacatttgagttcacaagctttttctaccgctatgggtttgtagccaacttcagcatggagctgcccagctttggagtt
tccggaattaatgaatcggctgacatgagcattggagttacagtgataaagaataatatgataaacaacgaccttggaccagcaacagcccagat
ggctcttcagctgttcattaaagactacagatacacctaccgatgccacagaggtgatacacaaattcaaactagaagatcatttgaattgaaga
agctgtgggagcagacccgctcaaaggcaggactgttggtttcagatggagggccgaatttatacaacatccggaatcttcacattccagaagtt
tgcttgaagtgggagttgatggatgaagattaccagggaagactgtgtaaccctctgaacccgtttgtcagtcataaggaagttgaatccgtcaa
caatgctgtggtaatgccagcccatggtccggccaagagcatggaatatgatgccgttgcaactacacattcatggattcccaagagaaatcgct
ccattctcaacactagccaaaggggaattcttgaggatgaacaaatgtaccagaagtgctgcactctattcgagaaattcttccctagcagttca
tatcggaggccagttggaatttccagcatgatggaggccatggtgtctagggcccgaattgatgcacggattgacttcgagtctggaaggattaa
gaaagaagaatttgctgagatcttgaagatctgttccaccattgaagagctcggacggcaagggaagtga
>gi|GENSCAN_predicted_peptide_21|709_aa
MAMKYPITADKRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVKIRRRV
DMNPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKREELKNCNIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW
EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGVRMVDILKQNPTEEQAVDICKAAMGLKISSSFSFGGFTFKRTKG
SSVKREEEVLTGNLQTLKIKVHEGYEEFTMVGRRATAILRKATRRMIQLIVSGRDEQSIAEAIIVAMVFSQEDCMVKAVRGDLNFVNRANQRLNP
MHQLLRHFQKDAKVLFQNWGIEPIDNVMGMIGILPDMTPSTEMSLRGVRVSKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGME
KLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQEPTMLYNKMEFEPFQSLVPKAARSQYSGFVRTLFQQMRDVLGTFDTVQIIKLLP
FAAAPPEQSRMQFSSLTVNVRGSGMRILVRGNSPAFNYNKTTKRLTILGKDAGALTEDPDEGTAGVESAVLRGFLILGKEDKRYGPALSINELSN
LTKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_21|2130_bp
atggcgatgaaatacccgatcacagctgacaaaagaataatggagatgatccctgaaaggaatgagcaaggccaaactctttggagcaaaacaaa
tgacgctggatcagacagggtaatggtatcacctctggctgtaacgtggtggaacagaaatggaccaacaacaagtacagtccattatccaaagg
tgtataaaacctactttgaaaaggttgaaagattaaaacacggaacctttggccctgttcatttccggaatcaagtcaaaatacgccgcagggtt
gacatgaaccctggccatgcagatctcagcgctaaagaagcacaagatgtcatcatggaggtcgttttcccaaatgaagttggagccaggatatt
gacatcagaatcacagctgacaataacaaaggaaaagagggaggaactcaagaattgtaatattgctcctttaatggtggcatatatgttggaaa
gagaattggttcgcaagaccagattcctacccgtggctggcgggacaagcagcgtatatatagaagtattgcatttgactcaaggaacttgctgg
gagcagatgtacacaccaggaggggaggtaagaaatgatgatgttgaccaaagtttaatcattgctgctaggaacattgtcaggagagcaacagt
atcagcagacccattggcttcactcctggaaatgtgccatagcacacaaattggcggagtaagaatggtagacatccttaaacaaaacccaacag
aagagcaagctgtagatatatgcaaggcagcaatgggtttgaaaatcagctcatccttcagctttggagggttcactttcaaaagaacaaagggg
tcttctgtcaaaagagaggaagaagtgcttacaggcaacctccaaacattgaagataaaagtacatgaaggatatgaggaattcacaatggttgg
acgaagagcaacagccattctaagaaaagcaaccagaaggatgatccaactgatagtcagcggaagggacgagcaatcaattgctgaggcaatta
ttgtggcaatggtgttctcacaagaagattgcatggtaaaggcagtccgaggtgatttgaatttcgtaaacagagcaaatcaacgactgaatccc
atgcaccaactcctgagacactttcaaaaggatgcaaaggtgctgtttcaaaactggggaattgaacccatcgacaatgtcatgggtatgattgg
aatattgcctgacatgacccccagcacggaaatgtcactaagaggagtgagagttagcaaaatgggggtggatgaatattctagcactgaaaggg
tggtcgtgagcattgaccgtttcttaagggtccgagatcagcgaggaaatgtactcctatcccctgaagaagttagtgaaacacagggaatggaa
aagttgacgataacttattcatcgtctatgatgtgggagattaacgggccagaatcagtgctagttaacacatatcaatggatcattaggaattg
ggagactgtaaagatccaatggtcccaagaacccaccatgctatacaataagatggagtttgaaccatttcaatctttagtaccaaaggctgcca
gaagccaatatagtggatttgtgagaacgctattccagcagatgcgtgatgttttgggaacgttcgacactgttcaaataatcaaactactacca
tttgcagcagccccaccggaacagagtaggatgcaattttcttctctgactgtgaatgtgaggggatcaggaatgagaatacttgtgagaggtaa
ctcccctgcatttaactacaacaagacaactaagaggcttacaatacttgggaaggacgcaggtgcgcttacagaggacccagatgaaggaacag
caggagtagagtctgcagtattgagaggatttctaatcctcggcaaagaagacaaaagatatggaccagcattaagcatcaatgaactgagcaat
cttacgaaaggggagaaagctaatgtattgatagggcaaggagacgtagtgttggtaatgaaacggaaacgggactctagcatacttactgacag
ccagacagcgaccaaaagaattcggatggccatcaattag
>gi|GENSCAN_predicted_peptide_22|751_aa
METISLITILLVVTASNADKICIGHQSTNSTETVDTLTETNVPVTHAKELLHTEHNGMLCATSLGHPLILDTCTIEGLVYGNPSCDLLLGGREWS
YIVERSSAVNGTCYPGNVENLEELRTLFSSASSYQRIQIFPDTTWNVTYTGTSRACSGSFYRSMRWLTQKSGFYPVQDAQYTNNRGKSILFVWGI
HHPPTYTEQTNLYIRNDTTTSVTTEDLNRTFKPVIGPRPLVNGLQGRIDYYWSVLKPGQTLRVRSNGNLIAPWYGHVLSGGSHGRILKTDLKGGN
CVVQCQTEKGGLNSTLPFHNISKYAFGTCPKYVRVNSLKLAVGLRNVPARSSRGLFGAIAGFIEGGWPGLVAGWYGFQHSNDQGVGMAADRDSTQ
KAIDKITSKVNNIVDKMNKQYEIIDHEFSEVETRLNMINNKIDDQIQDVWAYNAELLVLLENQKTLDEHDANVNNLYNKVKRALGSNAMEDGKGC
FELYHKCDDQCMETIRNGTYNRRKYREESRLERQKIEGGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDRAVKLYKKLKREMTFHGAKE
VALSYSTGALASCMGLIYNRMGTVTTEVALGLVCATCEQIADAQHRSHRQMATTTNPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEVASQA
RQMVQAMRTIGTHPSSSAASIIGILHLILWILDRLFFKCIYRRFKYGLKRGPSTEGVPESMREEYRQEQQNAVDVDDGHFVNIELE
>gi|GENSCAN_predicted_CDS_22|2256_bp
atggaaacaatatcactaataactatactactagtagtaacagcaagcaatgcagataaaatctgcatcggccaccagtcaacaaactccacaga
aactgtggacacgctaacagaaaccaatgttcctgtgacacatgccaaagaattgctccacacagagcataatggaatgctgtgtgcaacaagcc
tgggacatcccctcattctagacacatgcactattgaaggactagtctatggcaacccttcttgtgacctgctgttgggaggaagagaatggtcc
tacatcgtcgaaagatcatcagctgtaaatggaacgtgttaccctgggaatgtagaaaacctagaggaactcaggacactttttagttccgctag

81
ttcctaccaaagaatccaaatcttcccagacacaacctggaatgtgacttacactggaacaagcagagcatgttcaggttcattctacaggagta
tgagatggctgactcaaaagagcggtttttaccctgttcaagacgcccaatacacaaataacaggggaaagagcattcttttcgtgtggggcata
catcacccacccacctataccgagcaaacaaatttgtacataagaaacgacacaacaacaagcgtgacaacagaagatttgaataggaccttcaa
accagtgatagggccaaggccccttgtcaatggtctgcagggaagaattgattattattggtcggtactaaaaccaggccaaacattgcgagtac
gatccaatgggaatctaattgctccatggtatggacacgttctttcaggagggagccatggaagaatcctgaagactgatttaaaaggtggtaat
tgtgtagtgcaatgtcagactgaaaaaggtggcttaaacagtacattgccattccacaatatcagtaaatatgcatttggaacctgccccaaata
tgtaagagttaatagtctcaaactggcagtcggtctgaggaacgtgcctgctagatcaagtagaggactatttggagccatagctggattcatag
aaggaggttggccaggactagtcgctggctggtatggtttccagcattcaaatgatcaaggggttggtatggctgcagatagggattcaactcaa
aaggcaattgataaaataacatccaaggtgaataatatagtcgacaagatgaacaagcaatatgaaataattgatcatgaattcagtgaggttga
aactagactcaatatgatcaataataagattgatgaccaaatacaagacgtatgggcatataatgcagaattgctagtactacttgaaaatcaaa
aaacactcgatgagcatgatgcgaacgtgaacaatctatataacaaggtgaagagggcactgggctccaatgctatggaagatgggaaaggctgt
ttcgagctataccataaatgtgatgatcagtgcatggaaacaattcggaacgggacctataataggagaaagtatagagaggaatcaagactaga
aaggcagaaaatagagggggggattttagggtttgtgttcacgctcaccgtgcccagtgagcgaggactgcagcgtagacgatttgtccaaaatg
ccctaaatgggaatggagacccaaacaacatggacagggcagttaaactatacaagaagctgaagagggaaatgacattccatggagcaaaggaa
gttgcactcagttactcaactggtgcgcttgccagttgcatgggtctcatatacaaccggatgggaacagtgaccacagaagtggctcttggcct
agtatgtgccacttgtgaacagattgctgatgcccaacatcggtcccacaggcagatggcgactaccaccaacccactaatcaggcatgagaaca
gaatggtactagccagcactacggctaaggccatggagcagatggctggatcaagtgagcaggcagcagaagccatggaagtcgcaagtcaggct
aggcaaatggtgcaggctatgaggacaattgggactcaccctagttccagtgcagcaagtatcattgggatattgcacttgatattgtggattct
tgatcgtcttttcttcaaatgcatttatcgtcgctttaaatacggtttgaaaagagggccttctacggaaggagtgcctgagtctatgagggaag
agtatcggcaggaacagcagaatgctgtggatgttgacgatggtcattttgtcaacatagagctggagtaa
>gi|GENSCAN_predicted_peptide_23|759_aa
MERIKELRNLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRITEMIPERNEQGQTLWSKMNDAGSDRVMVSPLA
VTWWNRNGPMTNTVHYPKIYKTYFERVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKK
EELQDCKISPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCWEQMYTPGGEVKNDDVDQSLIIAARNIVRRAAVSADPLASLLEMCH
STQIGGIRMVDILKQNPTEEQAVGICKAAMGLRISSSFSFGGFTFKRTSGSSVKREEEVLTGNLQTLKIRVHEGYEEFTMVGRRATAILRKATRR
LIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNFVNRANQRLNPMHQLLRHFQKDAKVLFQNWGVEPIDNVMGMIGILPDMTPSIEMSM
RGVRISKMGVDEYSSTERVVVSIDRFLRVRDQRGNVLLSPEEVSETQGTEKLTITYSSSMMWEINGPESVLVNTYQWIIRNWETVKIQWSQNPTM
LYNKMEFEPFQSLVPKAIRGQYSGFVRTLFQQMRDVLGTFDTAQIIKLLPFAAAPPKQSRMQFSSFTVNVRGSGMRILVRGNSPVFNYNKATKRL
TVLGKDAGTLTEDPDEGTAGVESAVLRGFLILGKEDRRYGPALSINELSNLAKGEKANVLIGQGDVVLVMKRKRDSSILTDSQTATKRIRMAIN
>gi|GENSCAN_predicted_CDS_23|2280_bp
atggaaagaataaaagaactaagaaatctaatgtcgcagtctcgcacccgcgagatactcacaaaaaccaccgtggaccatatggccataatcaa
gaagtacacatcaggaagacaggagaagaacccagcacttaggatgaaatggatgatggcaatgaaatatccaattacagcagacaagaggataa
cggaaatgattcctgagagaaatgagcaaggacaaactttatggagtaaaatgaatgatgccggatcagaccgagtgatggtatcacctctggct
gtgacatggtggaataggaatggaccaatgacaaatacagttcattatccaaaaatctacaaaacttattttgaaagagtcgaaaggctaaagca
tggaacctttggccctgtccattttagaaaccaagtcaaaatacgtcggagagttgacataaatcctggtcatgcagatctcagtgccaaggagg
cacaggatgtaatcatggaagttgttttccctaacgaagtgggagccaggatactaacatcggaatcgcaactaacgataaccaaagagaagaaa
gaagaactccaggattgcaaaatttctcctttgatggttgcatacatgttggagagagaactggtccgcaaaacgagattcctcccagtggctgg
tggaacaagcagtgtgtacattgaagtgttgcatttgactcaaggaacatgctgggaacagatgtatactccaggaggggaagtgaagaatgatg
atgttgatcaaagcttgattattgctgctaggaacatagtgagaagagctgcagtatcagcagacccactagcatctttattggagatgtgccac
agcacacagattggtggaattaggatggtagacatccttaagcagaacccaacagaagagcaagccgtgggtatatgcaaggctgcaatgggact
gagaattagctcatccttcagttttggtggattcacatttaagagaacaagcggatcatcagtcaagagagaggaagaggtgcttacgggcaatc
ttcaaacattgaagataagagtgcatgagggatatgaagagttcacaatggttgggagaagagcaacagccatactcagaaaagcaaccaggaga
ttgattcagctgatagtgagtgggagagacgaacagtcgattgccgaagcaataattgtggccatggtattttcacaagaggattgtatgataaa
agcagttagaggtgatctgaatttcgtcaatagggcgaatcagcgactgaatcctatgcatcaacttttaagacattttcagaaggatgcgaaag
tgctttttcaaaattggggagttgaacctatcgacaatgtgatgggaatgattgggatattgcccgacatgactccaagcatcgagatgtcaatg
agaggagtgagaatcagcaaaatgggtgtagatgagtactccagcacggagagggtagtggtgagcattgaccggttcttgagagtccgggacca
acgaggaaatgtactactgtctcccgaggaggtcagtgaaacacagggaacagagaaactgacaataacttactcatcgtcaatgatgtgggaga
ttaatggtcctgaatcagtgttggtcaatacctatcaatggatcatcagaaactgggaaactgttaaaattcagtggtcccagaaccctacaatg
ctatacaataaaatggaatttgaaccatttcagtctttagtacctaaggccattagaggccaatacagtgggtttgtgagaactctgttccaaca
aatgagggatgtgcttgggacatttgataccgcacagataataaaacttcttcccttcgcagccgctccaccaaagcaaagtagaatgcagttct
cctcatttactgtgaatgtgaggggatcaggaatgagaatacttgtaaggggcaattctcctgtattcaactacaacaaggccacgaagagactc
acagttctcggaaaggatgctggcactttaaccgaagacccagatgaaggcacagctggagtggagtccgctgttctgaggggattcctcattct
gggcaaagaagacaggagatatgggccagcattaagcatcaatgaactgagcaaccttgcgaaaggagagaaggctaatgtgctaattgggcaag
gagacgtggtgttggtaatgaaacgaaaacgggactctagcatacttactgacagccagacagcgaccaaaagaattcggatggccatcaattag
>gi|GENSCAN_predicted_peptide_24|716_aa
MEDFVRQCFNPMIVELAEKTMKEYGEDLKIETNKFAAICTHLEVCFMYSDFHFINEQGESIIVELGDPNALLKHRFEIIEGRDRTMAWTVVNSIC
NTTGAEKPKFLPDLYDYKENRFIEIGVTRREVHIYYLEKANKIKSEKTHIHIFSFTGEEMATKADYTLDEESRARIKTRLFTIRQEMASRGLWDS
FRQSERGEETIEERFEITGTMRKLADQSLPPNFSSLENFRAYVDGFEPNGYIEGKLSQMSKEVNARIEPFLKTTPRPLRLPNGPPCSQRSKFLLM
DALKLSIEDPSHEGEGIPLYDAIKCMRTFFGWKEPNVVKPHEKGINPNYLLSWKQVLAELQDIENEEKIPKTKNMKKTSQLKWALGENMAPEKVD
FDDCKDVGDLKQYDSDEPELRSLASWIQNEFNKACELTDSSWIELDEIGEDVAPIEHIASMRRNYFTSEVSHCRATEYIMKGVYINTALLNASCA
AMDDFQLIPMISKCRTKEGRRKTNLYGFIIKGRSHLRNDTDVVNFVSMEFSLTDPRLEPHKWEKYCVLEIGDMLLRSAIGQVSRPMFLYVRTNGT
SKIKMKWGMEMRRCLLQSLQQIESMIEAESSVKEKDMTKEFFENKSETWPIGESPKGVEESSIGKVCRTLLAKSVFNSLYASPQLEGFSAESRKL
LLIVQALRDNLEPGTFDLGGLYEAIEECLINDPWVLLNASWFNSFLTHALS
>gi|GENSCAN_predicted_CDS_24|2151_bp
atggaagattttgtgcgacaatgcttcaatccgatgattgtcgagcttgcggaaaaaacaatgaaagagtatggggaggacctgaaaatcgaaac
aaacaaatttgcagcaatatgcactcacttggaagtatgcttcatgtattcagatttccacttcatcaatgagcaaggcgagtcaataatcgtag
aacttggtgatcctaatgcacttttgaagcacagatttgaaataatcgagggaagagatcgcacaatggcctggacagtagtaaacagtatttgc
aacactacaggggctgagaaaccaaagtttctaccagatttgtatgattacaaggaaaatagattcatcgaaattggagtaacaaggagagaagt
tcacatatactatctggaaaaggccaataaaattaaatctgagaaaacacacatccacattttctcgttcactggggaagaaatggccacaaagg
ccgactacactctcgatgaagaaagcagggctaggatcaaaaccaggctattcaccataagacaagaaatggccagcagaggcctctgggattcc
tttcgtcagtccgagagaggagaagagacaattgaagaaaggtttgaaatcacaggaacaatgcgcaagcttgccgaccaaagtctcccgccgaa
cttctccagccttgaaaattttagagcctatgtggatggattcgaaccgaacggctacattgagggcaagctgtctcaaatgtccaaagaagtaa
atgctagaattgaaccttttttgaaaacaacaccacgaccacttagacttccgaatgggcctccctgttctcagcggtccaaattcctgctgatg
gatgccttaaaattaagcattgaggacccaagtcatgaaggagagggaataccgctatatgatgcaatcaaatgcatgagaacattctttggatg
gaaggaacccaatgttgttaaaccacacgaaaagggaataaatccaaattatcttctgtcatggaagcaagtactggcagaactgcaggacattg
agaatgaggagaaaattccaaagactaaaaatatgaaaaaaacaagtcagctaaagtgggcacttggtgagaacatggcaccagaaaaggtagac

82
tttgacgactgtaaagatgtaggtgatttgaagcaatatgatagtgatgaaccagaattgaggtcgcttgcaagttggattcagaatgagttcaa
caaggcatgcgaactgacagattcaagctggatagagcttgatgagattggagaagatgtggctccaattgaacacattgcaagcatgagaagga
attatttcacatcagaggtgtctcactgcagagccacagaatacataatgaagggggtgtacatcaatactgccttacttaatgcatcttgtgca
gcaatggatgatttccaattaattccaatgataagcaagtgtagaactaaggagggaaggcgaaagaccaacttgtatggtttcatcataaaagg
aagatcccacttaaggaatgacaccgacgtggtaaactttgtgagcatggagttttctctcactgacccaagacttgaaccacacaaatgggaga
agtactgtgttcttgagataggagatatgcttctaagaagtgccataggccaggtttcaaggcccatgttcttgtatgtgaggacaaatggaacc
tcaaaaattaaaatgaaatggggaatggagatgaggcgttgtctcctccagtcacttcaacaaattgagagtatgattgaagctgagtcctctgt
caaagagaaagacatgaccaaagagttctttgagaacaaatcagaaacatggcccattggagagtctcccaaaggagtggaggaaagttccattg
ggaaggtctgcaggactttattagcaaagtcggtatttaacagcttgtatgcatctccacaactagaaggattttcagctgaatcaagaaaactg
cttcttatcgttcaggctcttagggacaatctggaacctgggacctttgatcttggggggctatatgaagcaattgaggagtgcctaattaatga
tccctgggttttgcttaatgcttcttggttcaactccttccttacacatgcattgagttag
>gi|GENSCAN_predicted_peptide_25|718_aa
MDTVNRTHQYSEKARWTTNTETGAPQLNPIDGPLPEDNEPSGYAQTDCVLEAMAFLEESHPGIFENSCIETMEVVQQTRVDKLTQGRQTYDWTLN
RNQPAATALANTIEVFRSNGLTANESGRLIDFLKDVMESMKKEEMGITTHFQRKRRVRDNMTKKMITQRTIGKRKQRLNKRSYLIRALTLNTMTK
DAERGKLKRRAIATPGMQIRGFVYFVETLARSICEKLEQSGLPVGGNEKKAKLANVVRKMMTNSQDTELSLTITGDNTKWNENQNPRMFLAMITY
MTRNQPEWFRNVLSIAPIMFSNKMARLGKGYMFESKSMKLRTQIPAEMLASIDLKYFNDSTRKKIEKIRPLLIEGTASLSPGMMMGMFNMLSTVL
GVSILNLGQKRYTKTTYWWDGLQSSDDFALIVNAPNHEGIQAGVDRFYRTCKLHGINMSKKKSYINRTGTFEFTSFFYRYGFVANFSMELPSFGV
SGSNESADMSIGVTVIKNNMINNDLGPATAQMALQLFIKDYRYTYRCHRGDTQIQTRRSFEIKKLWEQTRSKAGLLVSDGGPNLYNIRNLHIPEV
CLKWELMDEDYQGRLCNPLNPFVSHKEIESMNNAVMMPAHGPAKNMEYDAVATTHSWIPKRNRSILNTSQRGVLEDEQMYQRCCNLFEKFFPSSS
YRRPVGISSMVEAMVSRARIDARIDFESGRIKKEEFTEIMKICSTIEELRRQK
>gi|GENSCAN_predicted_CDS_25|2157_bp
atggatactgtcaacaggacacatcagtactcagaaaaggcaagatggacaacaaacaccgaaactggagcaccgcaactcaacccgattgatgg
gccactgccagaagacaatgaaccaagtggttatgcccaaacagattgtgtattggaagcaatggctttccttgaggaatcccatcctggtattt
ttgaaaactcgtgtattgaaacgatggaggttgttcagcaaacacgagtagacaagctgacacaaggccgacagacctatgactggactttaaat
agaaaccagcctgctgcaacagcattggccaacacaatagaagtgttcagatcaaatggcctcacggccaatgagtctggaaggctcatagactt
ccttaaggatgtaatggagtcaatgaaaaaagaagaaatggggatcacaactcattttcagagaaagagacgggtgagagacaatatgactaaga
aaatgataacacagagaacaataggtaaaaggaaacagagattgaacaaaaggagttatctaattagagcattgaccctgaacacaatgaccaaa
gatgctgagagagggaagctaaaacggagagcaattgcaaccccagggatgcaaataagggggtttgtatactttgttgagacactggcaaggag
tatatgtgagaaacttgaacaatcagggttgccagttggaggcaatgagaagaaagcaaagttggcaaatgttgtaaggaagatgatgaccaatt
ctcaggacaccgaactttctttgaccatcactggagataacaccaaatggaacgaaaatcagaatcctcggatgtttttggccatgatcacatat
atgaccagaaatcagcccgaatggttcagaaatgttctaagtattgctccaataatgttctcaaacaaaatggcgagactgggaaaagggtatat
gtttgagagcaagagtatgaaacttagaactcaaatacctgcagaaatgctagcaagcattgatttgaaatatttcaatgattcaacaagaaaga
agattgaaaaaatccgaccgctcttaatagaggggactgcatcattgagccctggaatgatgatgggcatgttcaatatgttaagcactgtatta
ggcgtctccatcctgaatcttggacaaaagagatacaccaagactacttactggtgggatggtcttcaatcctctgacgattttgctctgattgt
gaatgcacccaatcatgaagggattcaagccggagtcgacaggttttatcgaacctgtaagctacatggaatcaatatgagcaagaaaaagtctt
acataaacagaacaggtacatttgaattcacaagttttttctatcgttatgggtttgttgccaatttcagcatggagcttcccagttttggtgtg
tctgggagcaacgagtcagcggacatgagtattggagttactgtcatcaaaaacaatatgataaacaatgatcttggtccagcaacagctcaaat
ggcccttcagttgttcatcaaagattacaggtacacgtaccgatgccatagaggtgacacacaaatacaaacccgaagatcatttgaaataaaga
aactgtgggagcaaacccgttccaaagctggactgctggtctccgacggaggcccaaatttatacaacattagaaatctccacattcctgaagtc
tgcctaaaatgggaattgatggatgaggattaccaggggcgtttatgcaacccactgaacccatttgtcagccataaagaaattgaatcaatgaa
caatgcagtgatgatgccagcacatggtccagccaaaaacatggagtatgatgctgttgcaacaacacactcctggatccccaaaagaaatcgat
ccatcttgaatacaagtcaaagaggagtacttgaagatgaacaaatgtaccaaaggtgctgcaatttatttgaaaaattcttccccagcagttca
tacagaagaccagtcgggatatccagtatggtggaggctatggtttccagagcccgaattgatgcacggattgatttcgaatctggaaggataaa
gaaagaagagttcactgagatcatgaagatctgttccaccattgaagagctcagacggcaaaaatag
>gi|GENSCAN_predicted_peptide_26|230_aa
MDPNTVSSFQVDCFLWHVRKRVADQELGDAPFLDRLRRDQKSLRGRGSTLGLDIETATRAGKQIVERILKEESDEALKMTMASVPASRYLTDMTL
EEMSREWSMLIPKQKVAGPLCIRMDQAIMDKNIILKANFSVIFDRLETLILLRAFTEEGAIVGEISPLPSLPGHTAEDVKNAVGVLIGGLEWNDN
TVRVSETLQRFAWRSSNENGRPPLTPKQKREMAGTIRSEV
>gi|GENSCAN_predicted_CDS_26|693_bp
atggatccaaacactgtgtcaagctttcaggtagattgctttctttggcatgtccgcaaacgagttgcagaccaagaactaggtgatgccccatt
ccttgatcggcttcgccgagatcagaaatccctaagaggaaggggcagcactcttggtctggacatcgagacagccacacgtgctggaaagcaga
tagtggagcggattctgaaagaagaatccgatgaggcacttaaaatgaccatggcctctgtacctgcgtcgcgttacctaaccgacatgactctt
gaggaaatgtcaagggaatggtccatgctcatacccaagcagaaagtggcaggccctctttgtatcagaatggaccaggcgatcatggataaaaa
catcatactgaaagcgaacttcagtgtgatttttgaccggctggagactctaatattgctaagggctttcaccgaagagggagcaattgttggcg
aaatttcaccattgccttctcttccaggacatactgctgaggatgtcaaaaatgcagttggagtcctcatcggaggacttgaatggaatgataac
acagttcgagtctctgaaactctacagagattcgcttggagaagcagtaatgagaatgggagacctccactcactccaaaacagaaacgagaaat
ggcgggaacaattaggtcagaagtttga
>gi|GENSCAN_predicted_peptide_27|498_aa
MASQGTKRSYEQMETDGERQNATEIRASVGKMIGGIGRFYIQMCTELKLSDYEGRLIQNSLTIERMVLSAFDERRNKYLEEHPSAGKDPKKTGGP
IYRRVNGKWMRELILYDKEEIRRIWRQANNGDDATAGLTHMMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGVGTMV
MELVRMIKRGINDRNFWRGENGRKTRIAYERMCNILKGKFQTAAQKAMMDQVRESRDPGNAEFEDLTFLARSALILRGSVAHKSCLPACVYGPAV
ASGYDFEREGYSLVGIDPFRLLQNSQVYSLIRPNENPAHKSQLVWMACHSAAFEDLRVLSFIKGTKVVPRGKLSTRGVQIASNENMETMESSTLE
LRSRYWAIRTRSGGNTNQQRASAGQISIQPTFSVQRNLPFDRTTVMAAFTGNTEGRTSDMRTEIIRMMESARPEDVSFQGRGVFELSDEKAASPI
VPSFDMSNEGSYFFGDNAEEYDN
>gi|GENSCAN_predicted_CDS_27|1497_bp
atggcgtcccaaggcaccaaacggtcttacgaacagatggagactgatggagaacgccagaatgccactgaaatcagagcatccgtcggaaaaat
gattggtggaattggacgattctacatccaaatgtgcacagaacttaaactcagtgattatgagggacggttgatccaaaacagcttaacaatag
agagaatggtgctctctgcttttgacgaaaggagaaataaatacctggaagaacatcccagtgcggggaaggatcctaagaaaactggaggacct
atatacagaagagtaaacggaaagtggatgagagaactcatcctttatgacaaagaagaaataaggcgaatctggcgccaagctaataatggtga
cgatgcaacggctggtctgactcacatgatgatctggcattccaatttgaatgatgcaacttatcagaggacaagggctcttgttcgcaccggaa
tggatcccaggatgtgctctctgatgcaaggttcaactctccctaggaggtctggagccgcaggtgctgcagtcaaaggagttggaacaatggtg
atggaattggtcaggatgatcaaacgtgggatcaatgatcggaacttctggaggggtgagaatggacgaaaaacaagaattgcttatgaaagaat
gtgcaacattctcaaagggaaatttcaaactgctgcacaaaaagcaatgatggatcaagtgagagagagccgggacccagggaatgctgagttcg
aagatctcacttttctagcacggtctgcactcatattgagagggtcggttgctcacaagtcctgcctgcctgcctgtgtgtatggacctgccgta
gccagtgggtacgactttgaaagagagggatactctctagtcggaatagaccctttcagactgcttcaaaacagccaagtgtacagcctaatcag
accaaatgagaatccagcacacaagagtcaactggtgtggatggcatgccattctgccgcatttgaagatctaagagtattgagcttcatcaaag

83
ggacgaaggtggtcccaagagggaagctttccactagaggagttcaaattgcttccaatgaaaatatggagactatggaatcaagtacacttgaa
ctgagaagcaggtactgggccataaggaccagaagtggaggaaacaccaatcaacagagggcatctgcgggccaaatcagcatacaacctacgtt
ctcagtacagagaaatctcccttttgacagaacaaccgttatggcagcattcactgggaatacagaggggagaacatctgacatgaggaccgaaa
tcataaggatgatggaaagtgcaagaccagaagatgtgtctttccaggggcggggagtcttcgagctctcggacgaaaaggcagcgagcccgatc
gtgccttcctttgacatgagtaatgaaggatcttatttcttcggagacaatgcagaggagtacgacaattaa
>gi|GENSCAN_predicted_peptide_28|711_aa
MKANLLVLLCALAAADADTICIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDSHNGKLCRLKGIAPLQLGKCNIAGWLLGNPECDPLLPVRSWSY
IVETPNSENGICYPGDFIDYEELREQLSSVSSFERFEIFPKESSWPNHNTTKGVTAACSHAGKSSFYRNLLWLTEKEGSYPKLKNSYVNKKGKEV
LVLWGIHHPSNSKDQQNIYQNENAYVSVVTSNYNRRFTPEIAERPKVRDQAGRMNYYWTLLKPGDTIIFEANGNLIAPRYAFALSRGFGSGIITS
NASMHECNTKCQTPLGAINSSLPFQNIHPVTIGECPKYVRSAKLRMVTGLRNIPSIQSRGLFGAIAGFIEGGWTGMIDGWYGYHHQNEQGSGYAA
DQKSTQNAINGITNKVNSVIEKMNIQFTAVGKEFNKLEKRMENLNKKVDDGFLDIWTYNAELLVLLENERTLDFHDSNVKNLYEKVKSQLKNNAK
EIGNGCFEFYHKCDNECMESVRNGTYDYPKYSEESKLNREKGILGFVFTLTVPSERGLQRRRFVQNALNGNGDPNNMDKAVKLYRKLKREITFHG
AKEISLSYSAGALASCMGLIYNRMGAVTTEVAFGLVCATCEQIADSQHRSHRQMVTTTNPLIRHENRMVLASTTAKAMEQMAGSSEQAAEAMEVA
SQARQMVQAMRTIGTHPSSSAGLKNDLLENLQAYQKRMGVQMQRFK
>gi|GENSCAN_predicted_CDS_28|2136_bp
atgaaggcaaacctactggtcctgttatgtgcacttgcagctgcagatgcagacacaatatgtataggctaccatgcgaacaattcaaccgacac
tgttgacacagtgctcgagaagaatgtgacagtgacacactctgttaacctgctcgaagacagccacaacggaaaactatgtagattaaaaggaa
tagccccactacaattggggaaatgtaacatcgccggatggctcttgggaaacccagaatgcgacccactgcttccagtgagatcatggtcctac
attgtagaaacaccaaactctgagaatggaatatgttatccaggagatttcatcgactatgaggagctgagggagcaattgagctcagtgtcatc
attcgaaagattcgaaatatttcccaaagaaagctcatggcccaaccacaacacaaccaaaggagtaacggcagcatgctcccatgcggggaaaa
gcagtttttacagaaatttgctatggctgacggagaaggagggctcatacccaaagctgaaaaattcttatgtgaacaagaaagggaaagaagtc
cttgtactgtggggtattcatcacccgtctaacagtaaggatcaacagaatatctatcagaatgaaaatgcttatgtctctgtagtgacttcaaa
ttataacaggagatttaccccggaaatagcagaaagacccaaagtaagagatcaagctgggaggatgaactattactggaccttgctaaaacccg
gagacacaataatatttgaggcaaatggaaatctaatagcaccaaggtatgctttcgcactgagtagaggctttgggtccggcatcatcacctca
aacgcatcaatgcatgagtgtaacacgaagtgtcaaacacccctgggagctataaacagcagtctccctttccagaatatacacccagtcacaat
aggagagtgcccaaaatacgtcaggagtgccaaattgaggatggttacaggactaaggaacattccgtccattcaatccagaggtctatttggag
ccattgccggttttattgaagggggatggactggaatgatagatggatggtacggttatcatcatcagaatgaacagggatcaggctatgcagcg
gatcaaaaaagcacacaaaatgccattaacgggattacaaacaaggtgaactctgttatcgagaaaatgaacattcaattcacagctgtgggtaa
agaattcaacaaattagaaaaaaggatggaaaatttaaataaaaaagttgatgatggatttctggacatttggacatataatgcagaattgttag
ttctactggaaaatgaaaggactctggatttccatgactcaaatgtgaagaatctgtatgagaaagtaaaaagccaattaaagaataatgccaaa
gaaatcggaaatggatgttttgagttctaccacaagtgtgacaatgaatgcatggaaagtgtaagaaatgggacttatgattatcccaaatattc
agaagagtcaaagttgaacagggaaaaggggattttaggatttgtgttcacgctcaccgtgcccagtgagcgaggactgcagcgtagacgctttg
tccaaaatgcccttaatgggaacggggatccaaataacatggacaaagcagttaaactgtataggaagctcaagagggagataacattccatggg
gccaaagaaatctcactcagttattctgctggtgcacttgccagttgtatgggcctcatatacaacaggatgggggctgtgaccactgaagtggc
atttggcctggtatgtgcaacctgtgaacagattgctgactcccagcatcggtctcataggcaaatggtgacaacaaccaacccactaatcagac
atgagaacagaatggttttagccagcactacagctaaggctatggagcaaatggctggatcgagtgagcaagcagcagaggccatggaggttgct
agtcaggctaggcaaatggtgcaagcgatgagaaccattgggactcatcctagctccagtgctggtctgaaaaatgatcttcttgaaaatttgca
ggcctatcagaaacgaatgggggtgcagatgcaacggttcaagtga

Explanation

Gn.Ex : gene number, exon number (for reference)


Type : Init = Initial exon (ATG to 5' splice site)
Intr = Internal exon (3' splice site to 5' splice site)
Term = Terminal exon (3' splice site to stop codon)
Sngl = Single-exon gene (ATG to stop)
Prom = Promoter (TATA box / initation site)
PlyA = poly-A signal (consensus: AATAAA)
S : DNA strand (+ = input strand; - = opposite strand)
Begin : beginning of exon or signal (numbered on input strand)
End : end point of exon or signal (numbered on input strand)
Len : length of exon or signal (bp)
Fr : reading frame (a forward strand codon ending at x has frame x mod 3)
Ph : net phase of exon (exon length modulo 3)
I/Ac : initiation signal or 3' splice site score (tenth bit units)
Do/T : 5' splice site or termination signal score (tenth bit units)
CodRg : coding region score (tenth bit units)
P : probability of exon (sum over all parses containing exon)
Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)

After that pdf file graphical view of Genscan output is to be pasted…..

84
Table showing genome of different strains with their total number of Base pairs in their individual segments

NO.OF SEGMENTS IN GENOME NO.OF BASE PAIRS

H1N1
SEGMENT 1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1778
SEGMENT5 1565
SEGMENT6 1413
SEGMENT7 1027
SEGMENT8 890
H2N2
SEGMENT1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1773
SEGMENT5 1497
SEGMENT6 1410
SEGMENT7 1027
SEGMENT8 838
H3N2
SEGMENT1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1762
SEGMENT5 1566
SEGMENT6 1467
SEGMENT7 1027
SEGMENT8 890
H9N2
SEGMENT1 2341
SEGMENT2 2328
SEGMENT3 2225
SEGMENT4 1714
SEGMENT5 1557
SEGMENT6 1418
SEGMENT7 1025
SEGMENT8 890
H5N1
SEGMENT1 2341
SEGMENT2 2341
SEGMENT3 2233
SEGMENT4 1760
SEGMENT5 1565
SEGMENT6 1458
SEGMENT7 1027
SEGMENT8 865

Modelling

We perform modelling of protein of h5n1 strains, using SWISS MODEL SERVER.

Result of Blast of h5n1 strain:

85
% q Q s S
Query sequence subject identity A score start end start end E value Bitscore
gi|73921266|ref|YP_308668.1| pdb|1NMB|N 44.51 474 248 9 1 465 1 468 6.00E-111 396
gi|73921266|ref|YP_308668.1| pdb|5NN9| 48.81 379 188 5 91 465 10 386 2.00E-102 367
gi|73921266|ref|YP_308668.1| pdb|1XOG|A 48.81 379 188 5 91 465 9 385 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1L7F|A 48.81 379 188 5 91 465 10 386 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|4NN9| 48.81 379 188 5 91 465 10 386 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1NCC|N 48.81 379 188 5 91 465 11 387 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1NCA|N 48.81 379 188 5 91 465 11 387 5.00E-102 366
gi|73921266|ref|YP_308668.1| pdb|1NMA|N 48.28 379 190 5 91 465 10 386 8.00E-102 365
gi|73921266|ref|YP_308668.1| pdb|1NCD|N 48.28 379 190 5 91 465 11 387 8.00E-102 365
gi|73921266|ref|YP_308668.1| pdb|1L7H|A 48.55 379 189 5 91 465 10 386 1.00E-101 365
gi|73921266|ref|YP_308668.1| pdb|1W20|D 48.41 378 188 5 91 463 10 385 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|1NCB|N 48.55 379 189 5 91 465 11 387 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|6NN9| 48.55 379 189 5 91 465 10 386 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|3NN9| 48.55 379 189 5 91 465 10 386 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|1INY| 48.55 379 189 5 91 465 10 386 2.00E-101 364
gi|73921266|ref|YP_308668.1| pdb|1L7G|A 48.55 379 189 5 91 465 10 386 3.00E-101 363
gi|73921266|ref|YP_308668.1| pdb|2AEQ|A 46.7 379 189 7 92 463 18 390 4.00E-95 343
gi|73921266|ref|YP_308668.1| pdb|2BAT| 45.93 381 193 7 92 465 11 385 4.00E-95 343
gi|73921266|ref|YP_308668.1| pdb|1INX| 45.93 381 193 7 92 465 11 385 6.00E-95 343
1
gi|73921266|ref|YP_308668.1| pdb|1VCJ|A 36.36 352 203 0 116 455 37 379 7.00E-59 223
1
gi|73921266|ref|YP_308668.1| pdb|1B9V|A 36.36 352 203 0 116 455 38 380 7.00E-59 223
1
gi|73921266|ref|YP_308668.1| pdb|1INF| 36.36 352 203 0 116 455 38 380 7.00E-59 223
gi|73921266|ref|YP_308668.1| pdb|1A4Q|B 35.9 351 206 9 116 455 38 380 3.00E-58 221
gi|73921266|ref|YP_308668.1| pdb|2AZD|B 50 22 11 0 294 315 159 180 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|1QM5|B 50 22 11 0 294 315 159 180 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|2ECP|B 50 22 11 0 294 315 159 180 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|1AHP|B 50 22 11 0 294 315 160 181 0.22 32.3
gi|73921266|ref|YP_308668.1| pdb|1U8C|B 25 120 70 4 226 345 496 595 0.28 32
gi|73921266|ref|YP_308668.1| pdb|1SSK|A 31.75 63 42 1 146 207 76 138 1.1 30
gi|73921266|ref|YP_308668.1| pdb|1EGI|B 41.67 24 14 0 422 445 72 95 3.1 28.5
gi|73921266|ref|YP_308668.1| pdb|1EGZ|C 30.3 66 35 3 354 410 9 72 5.3 27.7
gi|73921266|ref|YP_308668.1| pdb|1WB0|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1HKM|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1HKK|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1LQ0|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73921266|ref|YP_308668.1| pdb|1GUV|A 28.75 80 51 2 328 407 263 336 9.1 26.9
gi|73852953|ref|YP_308667.1| pdb|1I7A|D 25.47 106 73 2 254 357 3 104 0.88 30.4
gi|73852953|ref|YP_308667.1| pdb|1TBG|H 32.73 55 35 1 41 95 6 58 5.7 27.7
gi|73852953|ref|YP_308667.1| pdb|1B9Y|B 32.73 55 35 1 41 95 6 58 5.7 27.7
gi|73852953|ref|YP_308667.1| pdb|1A0R|G 32.73 55 35 1 41 95 5 57 5.7 27.7
gi|73852953|ref|YP_308667.1| pdb|1GOT|G 32.73 55 35 1 41 95 13 65 5.7 27.7
gi|73852947|ref|YP_308664.1| pdb|1W1W|D 21.88 128 85 4 106 231 173 287 0.83 31.2
gi|73852947|ref|YP_308664.1| pdb|1GTM|C 20.65 92 72 1 149 240 274 364 4.1 28.9
gi|73852947|ref|YP_308664.1| pdb|1EUZ|F 20.21 94 74 1 149 242 274 366 5.3 28.5
gi|73852947|ref|YP_308664.1| pdb|1J0W|B 28.26 46 33 0 503 548 1 46 5.3 28.5
gi|73852947|ref|YP_308664.1| pdb|1BVU|F 21.74 92 71 1 149 240 273 363 7 28.1
gi|73852947|ref|YP_308664.1| pdb|1BBW|A 22.78 79 59 1 166 242 33 111 9.1 27.7
gi|73852947|ref|YP_308664.1| pdb|1KRT| 22.78 79 59 1 166 242 5 83 9.1 27.7
gi|73852957|ref|YP_308671.1| pdb|1EA3|B 95.12 164 8 0 1 164 1 164 3.00E-87 316
gi|73852957|ref|YP_308671.1| pdb|1AA7|B 94.94 158 8 0 1 158 1 158 3.00E-83 303
gi|73852957|ref|YP_308671.1| pdb|2CRL|A 28.24 85 52 3 118 199 2 80 4.1 26.9
gi|73852957|ref|YP_308671.1| pdb|1QGE|D 36.36 44 27 1 178 221 70 112 9.1 25.8
27
86
Result for the modelling

Template selection:
For modelling we choose the appropriate template, according to its E-value ,bit score, % identity and alignment
length from the list of PDB blast

H5n1

Target: Neuraminidase
Template pdb id=1w21
E-val=1e-96
Bit score=348
%identity=47.15%

Target: HA1
Template: PDB id=1jsmA
E-val=1e-96
Bit score= 645
%identity= 94.17%

Target: Non structural Protein


Template: PDB id= 1AIL
E val=-4e-23
Bit score= -103
%identity= 66.67%

Target: Matrix protein


Template: PDB id=1ea3
E-val=3e-87
Bit score= 316
%identity= 94.9%

Structure of matrix protein of h5n1 strain:

87
Docking:

88
We find that there is a het group (NAG N-acetyl-D-glucosamine) is present as a inhibitor for the neuraminidase
protein.We dock this inhibitor with this protein. After docking we get the evalue of this protein we choose the best
score means the protein which have the least e-value

Details of Neuraminidase protein:

PDB id:

Name:

Title:

Structure:

Source:

UniProt:

Enzyme class:

Reaction:

Functions: Cellular component membrane


Biological process carbohydrate metabolism
Biochemical function exo-alpha-sialidase activity
Resolution: 2.00 A

R-factor: 0.152

89
Structure of NAG:
Het..group:..NAG...

Chemical Formula : C8H15NO6

LIGPLOT of ligand's interactions with protein

NAG 200A to MAN 200G

Following Results we get after docking:

Calculations:
90
Receptor: 7NN9
Ligand: 1AOH

E-START: 8826.22 KJ/MOL


E-END: -609 KJ/MOL

TOTAL TIME TAKEN= 22 MIN 24 SEC

LIGAND: 1A3K

ESTART: 3.6 KJ/MOL


E-END: -240 KJ/MOL

TOTAL TIME TAKEN= 17 MIN 27 SEC

LIGAND: 1A3K

E-START: 3.96 KJ/MOL


E-END: - 239.49 KJ/MOL

TOTAL TIME TAKEN= 17 MIN 27 SEC

Graphical view of docking of Neuraminidase protein (7NN9) with NAG (1A3K)

91
Conclusion

After analyzing different strains of influenza A virus sequences we come to conclusion that though they
all are closely related, they have distinctly different pathogenic behaviour which plays an important role
in survival in different species. It is interesting to have closer look at the matter by studying at the gene
level. A phylogenetic analysis can be very helpful in understanding the evolutionary pattern

So based on current analysis, it can be said that different strain get diverged at different level.

We have noticed that same genes are present in all strains this shows that are they evolved together.
As influenza virus change through the well known process of Antigenic
drifting and shifting ,so as we are using four other strains with H5N1,it shows that they are somewhat
related to each other in past and may these strain give rise to each other(i.e. may be H5N1 was evolved
from H1N1, or any other strain. or vice versa)
Studies on the ecology of influenza viruses have led to the hypothesis that
all mammalian influenza viruses derive from the avian influenza reservoir.
With the finishing of the ongoing gene sequencing project on Avian
Influenza, we hope it will be possible to draw conclusive decision about the true picture of evolution in
near future and gene responsible for pathogenesis can also be identified.
Complete inference can only be drawn based on a comprehensive list of the
gene products and their function.
In order to find out unknown structure of protein present in the H5N1 strain
we do homology modelling. Till now the structures submitted is using X-ray crystallography or NMR
techniques. We forward step to present a theoretical model using available online modelling tools.
As we study that neuraminidase protein that is coded by NA gene is one of the
reasons of pathogenicity of Influenza A virus. So we tried to dock this protein with appropriate ligand,
in order to inhibit their activity on the basis of which the drugs have to be developed.

92
FUTURE PROSPECTS

93
Future prospects

The work presented in this report might just be a stepping stone for any such discoveries. The present
work might be small finding of big issue.

Phylogenetics is that field of biology which deals with identifying and understanding the relationships
between the many different kinds of life on earth. This includes methods for collecting and analysing
data, as well as interpretation of those results as new biological information.

With the aid of sequences it should be, possible to find the closely related organism. Experience learns
that closely related organism have similar sequences. More distantly related organism has more
dissimilar sequences. One objective is to reconstruct the evolutionary relationship between species.
Another objective is to estimating times of divergence between two organisms since they last shared a
common ancestor.

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the drug more
efficiently and with more effectiveness in future by analysing the modelled structure of protein.

As the new drugs target would be identified it will open new vistas for further drug development .The
finding of our docking will be useful in finding a cure for the infectious disease bird flu, also it will
open new avenues for finding other possible drug targets in influenza A virus.
The docking results can be used to design new lead compounds and hence can aid in the new drug
discovery process.

Finally, similar process can be applied on other pathogens and hence possible therapeutic sites can be
identified in them. Similar method can also be applied to other infectious diseases and hence we can
look forward to a better disease free world.

The work presented is just a small part of big issue and lots of work still needs to be done to establish a
good phylogenetic relationship and full fledged cure for bird flu. But we are hoping that these findings
will go long way and will prove fruitful to any going in a similar area.

94
BIBLIOGRAPHY
AND
REFERENCES

95
References

• Gog, J. R., Rimmelzwaan, G. F., Osterhaus, A. D. M. E., Grenfell, B. T. (2003). Population dynamics of
rapid fixation in cytotoxic T lymphocyte escape mutants of influenza A. Proc. Natl. Acad. Sci. U. S. A.
100: 11143-11147 [Abstract] [Full Text]
• Nakagawa, N., Nukuzuma, S., Haratome, S., Go, S., Nakagawa, T., Hayashi, K. (2002). Emergence of an
Influenza B Virus with Antigenic Change. J. Clin. Microbiol. 40: 3068-3070 [Abstract] [Full Text]
• Tumpey, T. M., Suarez, D. L., Perkins, L. E. L., Senne, D. A., Lee, J.-g., Lee, Y.-J., Mo, I.-P., Sung, H.-
W., Swayne, D. E. (2002). Characterization of a Highly Pathogenic H5N1 Avian Influenza A Virus
Isolated from Duck Meat. J. Virol. 76: 6344-6355 [Abstract] [Full Text]
• Benton, K. A., Misplon, J. A., Lo, C.-Y., Brutkiewicz, R. R., Prasad, S. A., Epstein, S. L. (2001).
Heterosubtypic Immunity to Influenza A Virus in Mice Lacking IgA, All Ig, NKT Cells, or {{gamma}}
{{delta}} T Cells. J Immunol 166: 7437-7445 [Abstract] [Full Text]
• Lindstrom, S. E., Hiromoto, Y., Nishimura, H., Saito, T., Nerome, R., Nerome, K. (1999). Comparative
Analysis of Evolutionary Mechanisms of the Hemagglutinin and Three Internal Protein Genes of
Influenza B Virus: Multiple Cocirculating Lineages and Frequent Reassortment of the NP, M, and NS
Genes. J. Virol. 73: 4413-4426 [Abstract] [Full Text]
• Voeten, J. T. M., Bestebroer, T. M., Nieuwkoop, N. J., Fouchier, R. A. M., Osterhaus, A. D. M. E.,
Rimmelzwaan, G. F. (2000). Antigenic Drift in the Influenza A Virus (H3N2) Nucleoprotein and Escape
from Recognition by Cytotoxic T Lymphocytes. J. Virol. 74: 6800-6807 [Abstract] [Full Text]
• Cooper, L. A., Subbarao, K. (2000). A Simple Restriction Fragment Length Polymorphism-Based
Strategy That Can Distinguish the Internal Genes of Human H1N1, H3N2, and H5N1 Influenza A
Viruses. J. Clin. Microbiol. 38: 2579-2583 [Abstract] [Full Text]
• Karasin, A. I., Olsen, C. W., Anderson, G. A. (2000). Genetic Characterization of an H1N2 Influenza
Virus Isolated from a Pig In Indiana. J. Clin. Microbiol. 38: 2453-2456 [Abstract] [Full Text]
• Naffakh, N., Massin, P., Escriou, N., Crescenzo-Chaigne, B., van der Werf, S. (2000). Genetic analysis
of the compatibility between polymerase proteins from human and avian strains of influenza A viruses. J
Gen Virol 81: 1283-1291 [Abstract] [Full Text]
• Hiromoto, Y., Yamazaki, Y., Fukushima, T., Saito, T., Lindstrom, S. E., Omoe, K., Nerome, R., Lim,
W., Sugita, S., Nerome, K. (2000). Evolutionary characterization of the six internal genes of H5N1
human influenza A virus. J Gen Virol 81: 1293-1303 [Abstract] [Full Text]
• Hiromoto, Y., Saito, T., Lindstrom, S. E., Li, Y., Nerome, R., Sugita, S., Shinjoh, M., Nerome, K.
(2000). Phylogenetic analysis of the three polymerase genes (PB1, PB2 and PA) of influenza B virus. J
Gen Virol 81: 929-937 [Abstract] [Full Text]
• Zhou, N. N., Senne, D. A., Landgraf, J. S., Swenson, S. L., Erickson, G., Rossow, K., Liu, L., Yoon, K.-
j., Krauss, S., Webster, R. G. (1999). Genetic Reassortment of Avian, Swine, and Human Influenza A
Viruses in American Pigs. J. Virol. 73: 8851-8856 [Abstract] [Full Text]
96
• Alexander DJ, Brown IH. “Recent zoonoses caused by influenza A viruses” Rev Sci Tech 2000; 19:197
225. First citation in article | PubMed
• Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb
Miller, and David J. Lipman Nucleic Acids Res. 25:3389-3402 (1997)
Genetic analysis of the compatibility between polymerase proteins from human and avian strains of influenza A
viruses by Nadia Naffakh1, Pascale Massin1, Nicolas Escriou1, Bernadette Crescenzo-Chaigne1 and Sylvie
van der Werf1 (http://jgv.sgmjournals.org/cgi/content/abstract/81/5/1283) read this article online

• Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and
Reassortment among Recent H3N2 Viruses “Edward C. Holmes1, Elodie Ghedin2, Naomi Miller2, Jill
Taylor3, Yiming Bao4, Kirsten St. George3, Bryan T. Grenfell1, Steven L. Salzberg2, Claire M. Fraser2,
David J. Lipman4*, Jeffery K. Taubenberger5”

• Influenza A (H3N2) Outbreak, Nepal Luke T. Daum,* Michael W. Shaw,Alexander I. Klimov,‡ Linda
C. Canas,* Elizabeth A. Macias,* Debra Niemeyer,* James P. Chambers,† Robert Renthal,† Sanjaya K.
Shrestha,§ Ramesh P. Acharya,¶ Shankar P. Huzdar,¶ Nirmal Rimal,¶ Khin S. Myint,# and Philip
Gould* (http://www.cdc.gov/ncidod/eid/vol11no08/05-0302.htm)

• Felsenstein J. (1981). PHYLIP: Phylogeny inference package (version 3.2). Cladistics 5: 164-166.

• Higgins DG and Sharp PM. (1988). CLUSTAL: A package for performing multiple sequence alignment
on a microcomputer. Gene 73: 237-244.
• Higgins DG, Thompson JD, and Gibson TJ. (1996). Using CLUSTAL for multiple sequence alignment.
Methods Enzymol. 266: 383-402.
• Mount DW. (2001). Bioinformatics: Sequence and genome analysis. Cold Spring Harbor Laboratory
Press, 564 pp.
• Saitou N and Nei M. (1987). The neighbor-joining method: A new method for reconstronting
phylogenetic trees. Mol. Biol. Evol. 4: 406-425.
• Hinshaw VS, Webster RG. The natural history of influenza A viruses. In: Beare AS, editor. Basic and
applied influenza research. Boca Raton (FL): CRC Press; 1982. p. 79-104.
• Scholtissek C, Naylor E. Fish farming and influenza pandemics. Nature 1988;331:215.
• Bean WJ, Kawaoka Y, Wood JM, Pearson JE, Webster RG. Characterization of virulent and avirulent
• Fouchier RAM, Munster V, Wallensten A, et al, 2005. Characterization of a novel influenza A virus
hemagglutinin subtype (H16) obtained from black-headed gulls. J Virol vol 79, issue 5, pp2814-22.
• Gambaryan A, Tuzikov A, Pazynina G, Bovin N, Balish A, Klimov A, 2005. Evolution of the receptor
binding phenotype of influenza A (H5) viruses in Virology (electronic publication ahead of print
version).

• Hatta M, Gao P, Halfmann P, Kawaoka Y, 2001. Molecular Basis for High Virulence of Hong Kong
H5N1 Influenza A Viruses in Science vol 293, pp1840-1842.

97
• Nelson DL and Cox MM, 2005. Lehninger's Principles of Biochemistry, 4th edition, WH Freeman, New
York, NY.

• Suzuki, Y, 2005. Sialobiology of Influenza: Molecular Mechanism of Host Range Variation of Influenza
Viruses in Biological and Pharmaceutical Bulletin, vol 28, pp399-408.

• Senne DA, Panigrahy B, Kawaoka Y, Pearson JE, Suss J, Lipkind M, Kida H, Webster RG, 1996. Survey
of the hemagglutinin (HA) cleavage site sequence of H5 and H7 avian influenza viruses: amino acid
sequence at the HA cleavage site as a marker of pathogenicity potential in Avian Disease vol 40, pp425-
437.

• Weis WI, Brünger AT, Skehel JJ, et al, 1990. Refinement of the influenza virus hemagglutinin by
simulated annealing. J Mol Biol vol 212, pp737-761.

• White JM, Hoffman LR, Arevalo JH, et al, 1997. Attachment and entry of influenza virus into host cells.
Pivotal roles of hemagglutinin. In Structural Biology of Viruses. Chiu W, Burnett RM, and Garcea RL,
editors. Oxford University Press, NY. pp80-104.

Website

1. http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/11308.html
2. http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html
3. http://www.cdc.gov/ncidod/eid/vol4no3/webster.htm
4. http://www.influenzacentre.org/fluinfo.htm
5. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=genome&cmd=search&term=influenza+A+virus
6. http://www.ncbi.nih.gov/genomes/VIRUSES
7. http://www.nhsdirect.nhs.uk
8. http://www.influenzareport.com/ir/ai.htm
9. http://www.agnr.umd.edu/avianflu/
10. http://www.cdc.gov/flu/about/fluviruses.htm

11. http://www.cdc.gov/flu/avian/gen-info/flu-viruses.htm

12. http://bioinformatics.ubc.ca/resources/tools/?name=clustalx

13. http://bips.u-strasbg.fr/fr/Documentation/ClustalX/

14. http://pbil.univ-lyon1.fr/software/njplot.html

15. http://www.cdc.gov/ncidod/eid/vol4no3/webster.htm#ref6

16. http://www.en.wikipidia.org//wiki

17. http://www.who.int/csr/don/2004_01_15/en/

18. http://www.mayoclinic.com/health/bird-flu/DS00566

19. http://www.pandemicflu.state.pa.us/pandemicflu/cwp/view.asp?a=501&q=151742`

20. http://micro.magnet.fsu.edu/cells/viruses/influenzavirus.html

98
21. http://www.cdc.gov/flu/about/fluviruses.htm

22. http://en.wikipedia.org/wiki/h5n1_genetic_structure

23. http://www.cdc.gov/flu/avian/gen-info/flu-viruses.htm

24. http://www3.niaid.nih.gov/news/focuson/flu/illustrations/antigenic/antigenicdrift.htm

25. http://www3.niaid.nih.gov/news/focuson/flu/illustrations/antigenic/antigenicshift.htm

26. http://www.cdc.gov/flu/avian/gen-info/flu-viruses.htm

27. http://en.wikipedia.org/wiki

28. http://pathmicro.med.sc.edu/mhunt/flu.htm

29. http://en.wikipedia.org/wiki/H5N1#Genetic_structure_and_related_subtypes

30. http://www.csd.abdn.ac.uk/hex/

31. http://www.ebi.ac.uk/thornton-srv/databases/pdbsum

32. http://www.ebi.ac.uk/thornton-srv/databases/CSA

33. http://en.wikipedia.org/wiki/Neuraminidase

34. en.wikipedia.org/wiki/Neuraminidase_inhibitor

35. www.qdots.com/live/render/content.asp

Books
1) “BIOINFORMATICS AND FUNCTIONAL GENOMICS”
Author: Jonanthan pevsner
2) “SEQUENCE AND GENOME ANALYSIS”
Author: David W Mount
3) “BIONFORMATICS—METHODS AND APPLICATION: GENOMICS, PROTEOMICS”
Author: S.C.Rastogi, Namita Mendiratta , Parag Rastogi

99
ABBREVIATION

100
Abbreviation

• CSA: Catalytic Site Atlas

• Emboss: European Molecular Biology Open Software Suit

• NCBI: National Centre for Biotechnology Information

• NDB: Nucleic Acid Database

• ORF: Open Reading Frame

• OTU: Operational Taxonomic Unit

• PDB: Protein Data Bank

• Phylip: Phylogeny Inference Package

101
APPENDIX

102
Appendix

PDBsum:- A database of the known 3D structures of proteins and nucleic acid PDBsum is a pictorial
database providing an at-a-glance overview of every macromolecular structure deposited in the Protein
Data Bank (PDB). It provides schematic diagrams of the molecules in each structure and of the
interactions between them. Entries are accessed by their PDB code (http://www.ebi.ac.uk/thornton-
srv/databases/pdbsum/)

Jena Library:- The Jena Library of Biological Macromolecules (JenaLib) is aimed at a better
dissemination of information on three-dimensional biopolymer structures with an emphasis on
visualization and analysis.
It provides access to all structure entries deposited at the Protein Data Bank (PDB) or at the Nucleic
Acid Database (NDB). ( http://www.fli-leibniz.de/IMAGE.html)

CSA (Catalytic Site Atlas):- The Catalytic Site Atlas (CSA) is a database documenting enzyme active
sites and catalytic residues in enzymes of 3D structure.
The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data
Bank.
The CSA contains 2 types of entry:
1. Original hand-annotated entries, derived from the primary literature. References for these
entries are given.

103
2. Homologous entries, found by PSI-BLAST alignment (using an e-value cut-off of 0.00005) to
one of the original entries. The equivalent residues, which align in sequence to the catalytic
residues found in the original entry are documented.
CSA Version 2.1.7 ( http://www.ebi.ac.uk/thornton-srv/databases/CSA)

Swiss model

Swiss model is an automated homology modelling server developed within the swiss institute of bioinformatics in collaboration
between Glaxo and SBG make it easy to submit a target sequence and get back an automatically generated homology model, provide an
empirical structure with >30% sequence identity exist to use as a template .These automated models may be useful, but will sometime
have error that could be avoided if manual adjustment are made to the sequence alignment by an expert .

SwissPDB Viewer: Swiss-PdbViewer can load and display several molecules simultaneously.Each
molecule is loaded into its own layer. Each molecule is composed of groups (i.e. amino acids,
nucleotides, substrates...). Each group is composed of atoms, whose coordinates are taken directly from
a PDB file.
Swiss PDV Viewer is a free program to display, analyse and manipulate PDB protein structures. Next
to features such as protein superimposition, H-bond detection, amino acid mutation etc., the protein is
tightly linked to Swiss- Model, an automated homology modelling server running at the Geneva
Biomedical Research Center. This allows
for threading a protein primary sequence to a 3D template and analysing homology. The displaying
options of the program include spacefill, ball & stick, stick and ribbon representations, all of which can
be applied simultaneously within one structure model.
SwissPDB Viewer Version 3.7 http://www.expasy.ch/spdbv/text/main.htm

Hex: - Hex is an interactive molecular graphics program for calculating and displaying feasible docking
modes of pairs of protein and DNA molecules. Hex can also calculate small-ligand/protein docking
(provided the ligand is rigid), and it can superpose pairs of molecules using only knowledge of their 3D
shapes.
In Hex's docking calculations, each molecule is modelled using 3D parametric functions which are used
to encode both surface shape and electrostatic charge and potential distributions
Hex Version 4.5 ( http://www.csd.abdn.ac.uk/hex/)

PHYLIP: (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies
(evolutionary trees). Methods that are available in the package include parsimony, distance matrix, and
likelihood methods, including bootstrapping and consensus trees. Data types that can be handled
104
include molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and
discrete characters.

Some sequence analysis programs such as the ClustalW alignment program can write data files in the
PHYLIP format. Most of the programs look for the data in a file called "infile" -- if they do not find this
file they then ask the user to type in the file name of the data file.

Output is written onto special files with names like "outfile" and "outtree". Trees written onto "outtree"
are in the Newick format, an informal standard agreed to in 1986 by authors of a number of major
phylogeny packages.

.http://evolution.genetics.washington.edu/phylip

Get ORF: Get ORF is a freely available online package of EMBOSS .


Its function is to Finds and extracts open reading frames (ORFs).This program finds and outputs the
sequences of open reading frames (ORFs).

The ORFs can be defined as regions of a specified minimum size between STOP codons or between
START and STOP codons.The ORFs can be output as the nucleotide sequence or as the translation.

The program can also output the region around the START or the initial STOP codon or the ending
STOP codons of an ORF for those doing analysis of the properties of these regions.

The START and STOP codons are defined in the Genetic Code tables. A suitable Genetic Code table
can be selected for the organism you are investigating.
(http://www.3rog.org/general/software/packages/emboss/getorf.html)

Clustal w: ClustalW is a general purpose multiple sequence alignment program for DNA or proteins.It
produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates
the best match for the selected sequences, and lines them up so that the identities, similarities and
differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
http://www.ebi.ac.uk/clustalw/)

GENSCAN: GENSCAN is a general-purpose gene identification program which analyzes


genomic DNA sequences from a variety of organisms including human, other vertebrates,
invertebrates and plants.

105
This server provides access to the program Genscan for predicting the locations and exon-intron
structures of genes in genomic sequences from a variety of organisms. This server can accept sequences
up to 1 million base pairs (1 Mbp) in length.
http://genes.mit.edu/GENSCAN.html
bioinformatics.ubc.ca/resources/tools/index.php?name=genscan

106
107
108