Вы находитесь на странице: 1из 2

GenomeData[ ] - Import FASTA file

¢ | £

Many projects require importing and then manipulating files as text or strings. A powerful set of functions for manipulating strings is
available and is similar to those for working with other data objects.

Import the MT chromosome on the human genome from a FASTA format file.

HSGenome = ImportA
"ftp:ftp.ncbi.nlm.nih.govgenomesH_sapiensCHR_MThs_ref
_chrMT.fa.gz", "FASTA"E;

Import local copy

HSGenome

Count the number of characters in the file and then display the first 200.

StringCount@HSGenome, LetterCharacterD

816 571<

StringTake@HSGenome, 200D

8GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTGTGCACGCGAT„
„
AGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTAT
CGCACCTACGTTCAATATTACAGGCGAACATACCTACTA<
2 Genome Pattern Matching.nb

Pattern Matching

¢ | £

Mathematica’s pattern matching features make it easy to search strings for any pattern of interest.

pattern = x_ ~~ x_ ~~ y_ ~~ y_ ~~ y_ ~~ x_ ~~ x_;

For example, this searches the full genome for all occurrences of the pattern.

StringCases@HSGenome, patternD

H CCAAACC CCAAACC CCAAACC GGTTTGG AACCCAA CCAAACC CCAAACC AAGGGAA TTAAATT AATTTAA AATTTAA

Here are the positions in the sequence at which the pattern occurs.

StringPosition@HSGenome, patternD

H 8298, 304< 8303, 309< 8304, 310< 8350, 356< 8554, 560< 8654, 660< 8915, 921< 81686, 1692< 81722, 1728< 81791, 1797< 82050, 2056

Color those subsequences in the range 2000 to 2500 that match the specified pattern.

ColorString@HSGenome, pattern, 82000, 2500<D

:TACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACT

TTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTA
AATTTAACTGTTAGTCCAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAA
AATTTAACACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACC„
„
TAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCCTATAGAAGAA
„
CTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAGCCTGCGTCAGATCAAAACACTGAA
CTGACAATTAACAGCCCAATATCTACAATCAACCAACAAGTCATTATTACCCTCACTGTC
AACCCAACACAGGCATGCTCATAA
GGAAAGGTTAAAAAAAGTAAAAGGAACTCGGCAAACCTTACCCCGCCTGTT>

Вам также может понравиться