Streptococcus Pyogenes: Core Genomes and Signature Genes That Define

Core genomes and signature genes that
define Streptococcus pyogenes

The signature genes tool developed for the SEED and implemented at the National Microbial
Pathogen Data Resource (NMPDR), www.nmpdr.org, was used to compare the translated genomes of all
completely sequenced strains of Streptococcus pyogenes (group A streptococcus or GAS) to define a core
genome for this human pathogen. The tool allows the user to select a reference genome to compare with
any number of genomes selected in a comparison set. The commonality factor is set to 80% by default but
may be reset by the user. For example, the 80% common core of GAS with respect to the strain with the
largest genome (MGAS 10750) contains 1,472 proteins that have bidirectional, best BlastP hits (BBH), at
an E-value of 1 x 10-10 or less, in 10 of the 12 available genomes. Increasing the stringency of the analysis
to 100% reduces the number of core proteins to 1,359. In addition to determining the proteins in common
to a set of genomes, we used the signature genes tool to define a signature set of proteins that
distinguishes the strains having the same M-type, e.g. M1 and M12. The information generated by this
genome comparison could be used to design a microarray for the simultaneous analysis of the core GAS
genome as well as signatures for each sequenced strain or M-type. The bioinformatics analysis reveals
interesting consistencies and inconsistencies which generate hypotheses for testing on microarrays.
Because protein functions in NMPDR are organized in subsystems, it is possible to infer functional
differences imparted by gene signatures. Subsystems annotation is used for metabolic reconstruction,
analysis of central machinery and signaling pathways, finding missing genes, integrating regulatory
networks, detection of horizontally transferred genes, and prediction of the functions of hypothetical
proteins. These genome annotation and comparison tools provide unprecedented information about GAS
biology and pathogenesis, and can be eventually applied to all sequenced bacterial pathogens.
Signature Genes Toolcompare and contrast

Comparefind proteins in common to a set of genomes:
Select a reference, or given, genome
Select one or more genomes to compare it with in the inclusion set 1
Select a commonality factor, set to 80% by default
Tool returns table of proteins shared by all or most of genomes in set 1,
with score indicative of the proportion of set 1 that contains each protein
Results include access to comparative analysis environment for every

protein found, including subsystems if desired
Streptococcus pyogenes core genomes:
100% S.pyogenes core, 1,359 proteins 80% S.pyogenes core, 1,472

M1 core: strains SF370 and MGAS 5005, 1,580 proteins
M3 core: strains MGAS 315 and SSI-1, 1,820 proteins
M12 core: strains MGAS 2096 and MGAS 9429, 1,668 proteins
Invasive core: strains MGAS 5005, MGAS 315 and SSI-1, 1,589
Virulence related core: invasive strains filtered with keyword virul*, 78
Contrastfind proteins that distinguish a set of genomes:

Select a reference, or given, genome
Select one or more genomes to compare it with in the inclusion set 1
Select one or more genomes to contrast with in the exclusion set 2
Tool finds genes shared by all or most of genomes in set 1; but which are
not present in all or most of genomes in set 2, with score indicative of the
match with the search parameters
Streptococcus pyogenes signatures:

M1 signature: strains SF370 and MGAS 5005 only, 11 proteins perfect score
M3 signature: strains MGAS 315 and SSI-1 only, 44 proteins perfect score
M12 signature: strains MGAS2096 and MGAS9429 only, 23 proteins perfect
score
Use of core genomes:

Define genes to be spotted on microarrays
Define minimal set of roles for mathematical modelling of metabolism
Use of protein signatures:

Discover mechanism for phenotype expression
Possible target for diagnostics or therapeutics
Pathogen-specific gateways to data
Search box restricted to organisms of interest

User forums with inquiry labs for teaching and exploration
Pathogen information describes genotypes, phenotypes,
serotypes, taxonomy, physiology, epidemiology
Literature aggregator and links to open access jounals
Pathogens in the news
Resource links including strain collections
Virtual proteome
Annotation status tables:
Immediate access to genes whose functions are

known with some degree of certainty
Named genes in subsystems
Named genes not in subsystems
Hypothetical genes in subsystems
Gateway to genes about which nothing is known
Hypothetical genes not in subsystems
List of genes with links to NMPDR analysis tools
Exploration in comparative framework first step to
formulating working hypotheses about functions
Subsystems approach to genome annotation

Subsystems annotation provides researchers with corrected
functional annotations in a structured biological context
Consistency across genomes achieved by vertical annotation of
functions rather than horizontal focus on single genomes
More than 500 distinct subsystems have been developed
Metabolic pathways
Complex structures
Genotype phenotype associations
Subsystems integrate genomic and functional contexts of genes in
metabolic reconstructions or populated subsystem spreadsheets
Metabolic reconstructions summarize all subsystems in a given
genome
Populated subsystems compare all genomes in a given
subsystem
Example: Streptococcal virulome

Open SS from subsystem search tree, or
from a protein page, i.e. SLO
Clustered genes within genome share color

Closely related roles merged in single column
marked with *
Mouse over column headers for full names of
functional roles
Defined in a few genomes and extended to others
via homology
Exploration of physical, genomic context

Protein context graphic and table
Focus protein highlighted green
Genes within about 6 kbp upstream and downstream shown
Genes with conserved proximity shown as blue arrows with functional
coupling scores, fc-sc
Pins
Color-matched orthlogs allow comparative analysis of functional
clustering and chromosomal rearrangements
Redraw the display to show genomes selected from commentary table
Compare regions
Subset of PINS opens with the 5 highest-scoring genomes
Size and number of compared regions may be reset by user
CL
Finds clusters containing the focus protein in other genomes
Useful for genes without functional coupling scores, fc-sc
fc-sc
Measures conservation of gene proximity and phylogenetic distance
Returns table listing pairs of proximal orthologs
Protein context
Focus gene is green
Proximity of blue genes
conserved in at least
four other species (not
strains)
Click to show functional
coupling scores, fc-sc
Click on score to see
table of paired homologs
in other genomes
Orthologous BBH
Bidirectional Best Hits
precomputed, reciprocal
BlastP results
Compare functional
annotations
Select and align
Click any gene id to
refocus display
Link out via alias ids
Pins locates SLS precursor in strain M5

Sag operon performs
biosynthesis (genes 1,2,3,8)
and transport (5-7)of the
bacteriocin-like toxin
streptolysin S (SLS)
Tiny sagA orf was missed in
the first annotation of the M5
genome; known to secrete
streptolysin S
Comparison of streptolysin S
biosynthesis protein D
(SagD, shown as gene 1)
among strep genomes points
to location of sagA in M5
Confirmed with BLASTX
Clostridium and Listeria don't
express the toxin; what is the
purpose of the biosynthesis
and transport proteins?
Streptolysin S subsystem locates orthologous

components in Listeria and Clostridium botulinum
Hypotheses generated by vertical annotation of functions rather
than horizontal focus on single genomes
Variant codes
1.1111 All elements of the Sag operon as in S. pyogenes are present: i.e,
SagA/SagB,C,D,E/SagF/SagG,H,I (ABC transporter)
2.0111 All elements but SagA (missed gene callcds exists)
3.0102 Missing SagA and SagF homologs but having an ABC transporter for
which closest homologs are teichoic acid or Bacillus multidrug export cassettes
4.0100 Only SagB,C,D,Ebiosynthetic pathway with no precursor or transport
Exploration of functional, biological context

Populated Subsystem Spreadsheet
Columns represent functional roles, mouse over header for definition

Genomes (rows) shown may be expanded and sorted by several criteria
Cells populated with specific, annotated genes linked to context pages
Functional variants defined by the annotated roles
Variant code -1 indicates subsystem is not functional
Codes defined in annotator's notes at bottom of page
Diagram of subsystem often provided
Protein families
FigFams taken from single column of functional roles
Pfam, TigrFams, COG, and others presented for comparison
Essential genes on genomic scale
Genome-scale essentiality data from studies of 10 species
Click on the red shaded portions of the bars to explore the genes
Candidate drug or therapeutic targets
Experimental verification of essential or virulent function
Close orthologs of proteins with experimentally determined structure

Streptococcus Pyogenes: Core Genomes and Signature Genes That Define

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Streptococcus Pyogenes: Core Genomes and Signature Genes That Define

Загружено:

Авторское право:

Доступные форматы

Core genomes and signature genes that

define Streptococcus pyogenes

Signature Genes Toolcompare and contrast

Results include access to comparative analysis environment for every

Streptococcus pyogenes core genomes:

100% S.pyogenes core, 1,359 proteins 80% S.pyogenes core, 1,472

Contrastfind proteins that distinguish a set of genomes:

Streptococcus pyogenes signatures:

Use of core genomes:

Use of protein signatures:

Pathogen-specific gateways to data

Search box restricted to organisms of interest

Annotation status tables:

Immediate access to genes whose functions are

formulating working hypotheses about functions

Subsystems approach to genome annotation

Example: Streptococcal virulome

Clustered genes within genome share color

Exploration of physical, genomic context

Pins locates SLS precursor in strain M5

Streptolysin S subsystem locates orthologous

Exploration of functional, biological context

Columns represent functional roles, mouse over header for definition

Close orthologs of proteins with experimentally determined structure

Вам также может понравиться