Вы находитесь на странице: 1из 12

12.

1
12.2
BioPerl
The BioPerl project is an international association of developers of open source Perl tools for
bioinformatics, genomics and life science research.

Things you can do with BioPerl:


• Read and write sequence files of different format, including: Fasta, GenBank, EMBL,
SwissProt and more…
• Extract gene annotation from GenBank, EMBL, SwissProt files
• Read and analyse BLAST results.
• Read and process phylogenetic trees and multiple sequence alignments.
• Analysing SNP data.
• And more…
12.3
BioPerl
BioPerl modules are called Bio::XXX
You can use the BioPerl wiki:
http://bio.perl.org/
with documentation and examples for how to use them – which is the best
way to learn this. We recommend beginning with the "How-tos":
http://www.bioperl.org/wiki/HOWTOs
To a more in depth inspection of BioPerl modules:
http://doc.bioperl.org/releases/bioperl-1.5.2/
12.4
BioPerl: the SeqIO module
The Bio::SeqIO module allows input/output of sequences from/to files, in many formats:
use Bio::SeqIO;
$in = new Bio::SeqIO( "-file" => "<seq1.embl",
"-format" => "EMBL");
$out = new Bio::SeqIO( "-file" => ">seq2.fasta",
"-format" => "Fasta");
while ( my $seqObj = $in->next_seq() ) {
$out->write_seq($seqObj);
}
A list of all the sequence formats BioPerl can read is in:
http://www.bioperl.org/wiki/HOWTO:SeqIO#Formats
12.5
BioPerl: the Seq module
use Bio::SeqIO;
$in = new Bio::SeqIO( "-file" => "<seq.fasta",
"-format" => "Fasta");
while ( my $seqObj = $in->next_seq() ) {
print "ID:".$seqObj->id()."\n"; #1st word in header
print "Desc:".$seqObj->desc()."\n"; #rest of header
print "Length:".$seqObj->length()."\n"; #seq length
print "Sequence: ".$seqObj->seq()."\n"; #seq string
}

The Bio::SeqIO function “next_seq” returns an object of the Bio::Seq module. This module provides
functions like id() (returns the first word in the header line before the first space), desc() (the rest of the header line),
length() and seq() (return sequence length). You can read more about it in: http://
www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object
12.6
BioPerl: Parsing a GenBank file
The Bio::Seq can read and parse the adenovirus genome file for us:

gene 1..1846 primary tag: gene


/gene="NDP" tag: gene
/note="ND" value: NDP
/db_xref="LocusID:4693" tag: note
/db_xref="MIM:310600" value: ND
CDS 409..810 tag: db_xref
/gene="NDP" value: LocusID:4693
/note="Norrie disease (norrin)" value: MIM:310600
/codon_start=1 primary tag: CDS
tag: gene
/product="Norrie disease protein"
/protein_id="NP_000257.1" value: NDP
/db_xref="GI:4557789" tag: note
/db_xref="LocusID:4693" value: Norrie disease (norrin)
/db_xref="MIM:310600" ...
/translation="MRKHVLAASFSMLSLL...
SHPLYKCSSKMVLLARCEGHCSQAS...
PLVSFSTVLKQPFRSSCHCCRPQTS
LTATYRYILSCHCEEC "
12.7
BioPerl: Parsing a GenBank file
The Bio::Seq can read the adenovirus genome file for us:
use Bio::SeqIO;
$in = new Bio::SeqIO("-file" => $inputfilename,
"-format" => "GenBank");
my $seqObj = $in->next_seq();
foreach my $featObj ($seqObj->get_SeqFeatures) {
print "primary tag: ", $featObj->primary_tag, "\n";
foreach my $tag ($featObj->get_all_tags) {
print " tag: ", $tag, "\n";
foreach my $value ($featObj->get_tag_values($tag)) {
print " value: ", $value, "\n";
}
}
}

primary tag: gene


tag: gene
value: NDP
tag: note
value: ND
tag: db_xref
value: LocusID:4693
value: MIM:310600
12.8
BioPerl: downloading files from the web
The Bio::DB::Genbank module allows us to download a specific record from
the NCBI website:
use Bio::DB::GenBank;
$gb = new Bio::DB::GenBank;
$seqObj = $gb->get_Seq_by_acc("J00522");
# or ... request Fasta sequence
$gb = new Bio::DB::GenBank("-format" => "Fasta");
12.9
BioPerl: reading BLAST output
First we need to have the BLAST results in a text file BioPerl can read.
Here is one way to achieve this:

Download

Text
12.10
BioPerl: reading BLAST output
12.11
BioPerl: reading BLAST output
12.12
BioPerl: reading BLAST output
The Bio::SearchIO module can read and parse BLAST output:
use Bio::SearchIO;
my $blast_report =
new Bio::SearchIO ("-format" => "blast",
"-file" => "mice.blast");
while (my $result = $blast_report->next_result) {
print "Checking query ", $result->query_name, "\n";
while (my $hit = $result->next_hit()) {
print "Checking hit ", $hit->name(), "\n";
my $hsp = $hit->next_hsp();
print $hsp->hit->start()...
$hsp->hit->end()...
}
}
(See the blast example in lesson 1)

Вам также может понравиться