Вы находитесь на странице: 1из 7

SNPPEB 1.

1
Software Design Document
(current document version 1.101)

Document update history: version 1.0 Created by Tony on Aug 4, 2004 Description: First draft for general idea version 1.01 Modified by Tony on September 29, 2004 Draw flowchart to show design, and also address the issues in SRS1.01 Version 1.011 Modified by Tony on Oct 1, 2004 Still address to SRS1.01 More detailed module design in section 5. Version 1.012 Modified by Tony on Oct 6, 2004 Still address to SRS1.01 Modify module design in section 5 from version 1.011. Version 1.1 Modified by Tony on Jan 31, 2005 Address to SRS 1.1 Major change: 1. provide service to new Backman GenomeLab SNPstream Genotyping System 2. Setup local databases instead of using XML files from NCBI Version 1.101 Modified by Tony on Mar 10, 2005 Still address to SRS 1.1 Redefine DB design

1. Description
The requirements in SRS will be fully addressed in this software design document or alternative solution should be given. We will use reference sequence data from NCBI in fasta files and XML file to setup our local dabases. Also, "Primer3" (http://frodo.wi.mit.edu/primer3/primer3_code.html) will be integrated into our application within the useage condition in its copyright document.

2. Function Design
In this version of design document, we have a primary design to address the issues in SRS 1.1, and draw a design flowchart. a. Input and criteria Input: List of SNP ids Two locations in a chromosome (Two STS markers?) Criteria: Orientation: Original, all forward, all reverse SNP types: (any combination) 6 types Exclude coding SNPs? Flanking sequence length? Number of SNPs to be separated Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/

b. Query local databases Database name: snppeb i. ER:

ii.

Table and column definition:


accession.version format, example: NT_077402.1 internal ID, example: CONTIG:77451 9606 is Homo sapiens length of contig chromosome. Un is not placed on any chromosome chromosome coordinate, reported in 1 base coordinates, starts from 1. 0 means not localized or placed on any chromosome chromosome coordinate, reported in 1 base coordinates. 0 means not localized or placed on any chromosome +, -, 0, where 0 indicates uncertainty in orientation this value is used to associate contigs with a particular assembly (e.g., reference assembly vs alternate assemblies provided by other groups or representing other haplotypes)

Table genome_contig:
accession: ctg_id: tax_id: ctg_length: chr: chr_from: chr_to: orient: assembly:

Table genome_contig_set:
accession: accession.version format, example: NT_077402.1

segment_id: #ctg_from: #ctg_to:

seq:

this is associated with ctg_from and ctg_to. Let ctg_from m ctg_to Segment_id = int((m-1)/200 + 1); contig coordinate, reported in 1 base coordinates, starts from 1. Not added in DB, can be calculated from seqment_id: ctg_from = 200 * (segment_id 1); contig coordinate, reported in 1 base coordinates. Not put in db. Can be calculated from segment_id and seq: ctg_to = 200 * (segment_id 1) + seq.length 1; sequence segment from contig, lower case means repetitive

Table snp_info:
id: tax_id: build_create: build_update: allele_1, allele_2: allele_1_frq: allele_2_frq: frq_count: validated_pop: validated_frq: validated_clu: validated_2h2: validated_hap: ctg_accession: ctg_chr: ctg_loc: chr_loc: ctg_ori: ctg_fxn: rs# species id, 9606 for human build to create this SNP last build to update this SNP nucleotides in SNP site, 1 and 2 are in alphabet order (example: A C, not C A) average frequency of allele_1 average frequency of allele_2 number of all chromosomes contributing to frequency calculation. T|F, at least one ss in cluster was validated by independent assay T|F, at least one subsnp in cluster has frequency data submitted T|F, cluster has 2+ submissions, with 1+ submission assayed with a non-computational method T|F, all alleles have been observed in 2+ chrosomes T|F, validated by HapMap project mapping contig in accession.version format, example: NT_077402.1 chromosome of mapping contig snp location mapped to contig snp location mapped to chromosome orientation of snp and flanking sequence to contig functional relationship of SNP to genes at contig location: locus-region |coding |conding-synon |coding-nonsynon | mrnautr |intron |splice-site |reference |exception

Table snp_flanking:
id: side: fragment: seq: rs# 5|3, 5 or 3 side number index of fragment of a flanking sequence in order 5 side starts from the far end to SNP site, 3side starts from the immediate neighboring site of SNP fragement of flanking sequence

c. Information retrieval SNP Information displayed: Checkbox for further primer design SNP id Allele Allele frequencies Flanking sequences, length and orientation Verification information function class(coding nonsynon, coding synon, ...) location info (chr, contig, ...) Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/

d. Primer design i. Generate text file for autoprimer.com ii. Primer in batch (call primer3) Parameter setup This page will be similar to the primer3 web application to setup parameters to run primer3. The default value will be given according to suggestions from our lab specialists. Display Result This page will display the primers for a list of SNPs. The format will be customized by our lab specialists.

3. Flowchart

4. System Requirement and Running Enviroment


Programming tool: Java, PHP, Perl, CGI, BioPerl, XML::Twig Primer design software: Primer3 Running environment: Redhat Enterprise Linux ws3, Dell workstation Precision 670 Database server: MySQL Server: bioinfo.vipbg.vcu.edu/SNPPEB Client: IE, Mozilla, or Netscape browser and internet connection

Вам также может понравиться