Вы находитесь на странице: 1из 5

International Journal of Scientific & Engineering Research, Volume 1, Issue 2, November-2010 1

ISSN 2229-5518

Computer Aided Screening for Early

Detection of Breast Cancer using BRCA
Gene as an Indicator
Abstract: A mass of breast tissue that is developing in an abnormal, uncontrolled way is the cancerous
breast tumor. The early detection of breast cancer is a key for survival because of its association with
augmented treatment options. Mammography screening and MRI are some of the existing breast cancer
detection methods. MRI has problem of resulting more number of false positives. Mammogram has
disadvantages like expensive, false positives for patients with dense breast tissues, detects only if tumor size
bigger than 5mm and painful. Hence there is a need to develop more convenient and accurate method. In this
proposed approach, we analyzed gene expression patterns in blood cells for detecting the breast cancer in
the early stage. BRCA gene is a tumor suppressor gene which all people have. The BRCA DNA sequences from
patients are generated by PCR method and used as input in the local sequence alignment program which is the
implementation of Smith waterman algorithm. It compares the patient's gene sequence with the reference
BRCA gene sequence to determine the cancer risk at a very early stage.

Keywords: Breast cancer, early detection, Tumor suppressor genes, BRCA, blood sample, PCR method, DNA
sequencing, gene sequence, Local sequence alignment algorithm, Smith waterman.

Swelling or a lump in the armpit

Breast cancer is a malignant tumor that starts from 2. PROBLEM DESCRIPTION
cells of the breast. A malignant tumor is a group of
cancer cells that may grow into (invade) surrounding Although highly effective, it has significant
tissues or spread (metastasize) to distant areas of the limitations like, in the absence of micro
body. The disease occurs almost entirely in women, calcification, mammography often fails to detect
but men can get it, too [3]. It is the second most tumors that are less than 5 mm in size, and also
common cancer that causes death in white, black, mammograms of women with dense breast tissue
Asian/Pacific Islander, and American Indian/Alaska are difficult to interpret. For example, in a study of
Native women. According to the report 2009-2010, over 11,000 women with no clinical symptoms of
breast cancer, the sensitivity of mammography was
in 2009, an estimated 192,370 new cases of invasive
only 48% for the subset of women with! extremely
breast cancer was diagnosed among women and dense breasts, compared with 78% sensitivity for
approximately 40,170 women were expected to die the entire sample of women in the study. So there is
from breast cancer [2]. Staging is a method that a need to develop more accurate, convenient and
has been developed to describe the extent of cancer objective detection method [12].Comparing patient
growth. In general, the lower the stage, better the! BRCA gene with the original gene is the identified
person's prognosis [5].Early detection of breast method in the proposed approach. Gene is a stretch
cancer can improve the chances of successful of DNA, so DNA sequences are compared to
treatment and recovery. Mammography identify cancer risk. The sequence comparison is
screening is the most reliable method to detect executed using the dynamic programming algorithm
breast cancer in asymptomatic patients. It is for local alignment between two DNA sequences
extremely important to catch breast cancer at an proposed by Smith and Waterman called smith
early stage. Few of the main symptoms of breast waterman algorithm is a very well known and
cancer are: versatile algorithm [16].
Change in the size or shape of a breast 3. EXPERIMENTAL STUDY OF BRCA GENES
Dimpling of the breast skin
The nipple becoming inverted The official name of BRCA 1 gene and BRCA2
gene are breast cancer susceptibility gene 1 and
IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 2, November-2010 2
ISSN 2229-5518

breast cancer susceptibility gene 2, respectively. The them into smaller pieces by inserting spaces in one
BRCA genes belong to a class of genes known as or the other so that identical subsequences are
tumor suppressor genes [10]. Like many other eventually aligned in a one-to-one correspondence
tumor suppressors, the protein produced from the naturally, spaces are not inserted in both sequences
BRCA genes helps prevent cells from growing and at the same position. The objective of sequence
dividing too rapidly or in an uncontrolled way. alignment is to match identical subsequences as far
There is no strong homology between BRCA1 and as possible. However, if the sequences are not
BRCA2, although both genes have a large exon 11 identical, mismatches are likely to occur as different
which seems to be crucial for function. However, letters are aligned together. The insertion of spaces
the function of the two genes seems to be similar produced gaps in the sequences. They are important
[14, 20]. The BRCA genes provides instructions to allow a good alignment between the characters of
for making a protein that is directly involved in sequences. A gap in the first sequence is considered an
repairing damaged DNA. By helping repair insertion of a character from the second sequence into
DNA, BRCA 1 plays a role in maintaining the the first one, whereas a gap in the second sequence is
stability of a cell's genetic information [13]. It considered a deletion of a character of the first
is identified that more than 1,000 mutations in the sequence.
both genes have a large exon 11 which seems to be
crucial for function. However, the function of the Once the alignment is produced, a score | can be
two genes seems to be similar [14, 20]. The assigned to each pair of aligned letters, called aligned
BRCA genes provides instructions for making a pair, according to a chosen scoring scheme. The
protein that is directly involved in repairing similarity of two sequences can be defined the
damaged DNA. By helping repair DNA, BRCA best score among all possible alignments between
plays a role in maintaining the stability of a cell's them. Sequence comparison is actually a well-
genetic information [13]. know problem in computer science.
Computational approaches to sequence alignment
It is identified that more than 1,000 generally fall into two categories: global
mutations in theBRCA1 gene and 800 mutations alignments and local alignments. Pair wise
in the BRCA2 gene are possible, many of which sequence alignment methods are used to find the
are associated with an increased risk of breast best-matching piecewise (local) or global
cancer. Most of these mutations lead to the j alignments of two query sequences. Pair wise
production of an abnormally short version of the alignments can only be used between two
BRCA1 protein, or prevent any protein from sequences at a time, but they are efficient to
being made from one copy of the gene. Other calculate and are often used for methods that do
BRCA1 mutations change single j protein building not require extreme precision (such as searching
blocks (amino acids) in the protein or delete large a database for sequences with high similarity to a
segments of DNA from the BRCA1 gene. Many query). The three primary methods of producing
BRCA2 mutations insert or delete a small number pair wise alignments are dot-matrix methods,
of DNA building blocks (nucleotides) in the gene. dynamic programming, and word methods.
Researchers believe that a defective or missing
BRCA1 protein is unable to help repair damaged Global alignment is achieved using the
DNA or fix mutations that occur in other |genes. Needleman-Wunsch algorithm. The algorithm it
As these defects accumulate, they can allow cells tries to take all of one sequence and align it with
to | grow and divide uncontrollably and form a tumor all of a second sequence. Short and highly
[8, 9]. similar subsequences may be missed in the
alignment because they are outweighed by the rest
4. SEQUENCE COMPARISON of the sequence. Hence, one would like to create
a locally optimal alignment [18]. Local
Sequence comparison can be defined as the alignments are more useful for dissimilar
problem of J finding which parts of the sequences
sequences that are suspected to contain regions of
are similar and which parts are different. Generally,
similarity or similar sequence motifs within their
a measure of how similar they are is also desirable. A
typical approach to solve this problem is to find a larger sequence context. The Smith-Waterman
good and plausible alignment between the two algorithm is a general local alignment method also
sequences. Then, given an appropriate scoring based on dynamic programming. The dynamic
scheme, their similarity can be computed. Generally, programming approach to pair wise sequence
sequence comparisons involve aligning sections of alignment is guaranteed to provide the optimal
the two sequences in a way that exposes the global or local pair wise alignment and score given
similarities between them [7]. The idea of a particular scoring scheme [1]. In smith waterman
aligning two sequences (of possibly different algorithm,
sizes) is to write one on top of the other, and break 1. All symbols (residues) in the two

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 2, November-2010 3
ISSN 2229-5518

sequences have to be in the

alignment, and in the same order
they appear in the sequences
2. We can align one symbol from one
sequence with one from another
3. A symbol can be aligned with a
blank ('-')
4. Two blanks cannot be aligned [6, 15,


DNA sequencing is the first step of sequence

analysis. DNA sequencing refers to sequencing
methods for determining the order of the nucleotide
bases-adenine, guanine, cytosine and thymine in a
molecule of DNA. DNA Sequencing can be
performed using PCR .Steps in DNA sequencing

1. Extract genomic DNA

2. Amplify known gene region
3. Verify successful PCR amplification
4. Clean PCR products
5. Quantify DNA concentration
6. Cycle sequence

Precipitate cycle- sequenced products and submits

them. Consecutively, the obtained DNA sequence
is used as Input
in the local sequence alignment program, Smith
waterman [11]. The system flow diagram is shown in Fig 1 System Flow Diagram
Fig 1.

5.1. Smith waterman Algorithm

5.2. Working principle of the algorithm
A matrix H is built as follows:
1. Assigns a score to each pair of bases
H(i,0) = 0, 0 <= i <= m a. Uses similarity scores only
b. Uses positive scores for related
H(0, j) = 0,0 <= j <= n residues
c. Uses negative scores for
H (i , j) = max { 0
substitutions and gaps
H(i-,j, j-l)+w(a „bj) Match/Mismatch 2. Initializes edges of the matrix with zeros
H(i-1,j) + w(ai ,- ) Deletion 3. As the scores are summed in the matrix,
H(I,j-1) + w( , bi) Insertion any score below 0 is recorded as 0
} 1<=i<=m, 1<= j<= n 4. Begins the trace back at the maximum
value found anywhere in the matrix
Where: 5. Continues until the score falls to 0.
This algorithm will give the place of mismatch
a, b = Strings over the Alphabet S and with those results presence or absence of
m = length(a) cancer can be I confirmed [4, 19].
n = length(b)
H(i j) - is the maximum Similarity-Score 6. IMPLEMENTATION AND RESULT ANALYSIS
between a suffix of a[l...i] and a suffix of
b[l...j] The program is implemented in JAVA. The
w(c,d), c, d € Z U ~[ ‘-‘],’-‘ is the gap-scoring DNA sequence generated by PCR is used as an
scheme . input for the proposed detection method. The

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 2, November-2010 4
ISSN 2229-5518

implemented program will read the content of the

file and compares the input DNA sequence with
the reference gene sequence file. If the mismatch
is null, the sequence is same therefore the patient is
healthy which is evident from Fig 2.

Fig 4 survival rate of patients(5 year relative) in early

detection of breast cancer (Source: American
cancer Society)


The proposed approach for early detection of

breast cancer using local sequence alignment
technique identifies whether patient is affected by
cancer or not. Furthermore, the risk level of cancer
in affected patients is also determined.
Consequently, this early detection method using
DNA sequencing has significantly advantageous
than other methods since cancer risks can be
identified in the early stage, even before the
symptoms are clearly observable. Moreover, this
method is beneficial as a consequence of its ease
of use, economical with respect to laboratory
usage and reliable as genes are used for
detection. The proposed approach has 95%
efficiency in detecting breast cancer in early
stage. This project assures more effective and
accurate method and aims towards breast cancer
detection in early stage.


The current evaluation system has potential

outcome in observing the cancer risk of patient.
Fig. 3 shows that patient sequence does not match with original
sequence and hence patient has cancer risk
The smith waterman algorithm is effective for
text string matching, but an assessment is
Otherwise, there is mismatch value and so patient required to determine the proportional benefits of
is in cancer risk therefore patient has to be the algorithm with the traditional techniques and
recommended for treatment, is shown in Fig 3. The other sequencing algorithms. Thought smith
Graph (Fig 4) signifies the early detection of waterman algorithm is very sensitive and accurate,
it has more time complexity and it needs large
cancer and survival rate.
memory space. As the biological sequencing data
are rapidly expanding, the memory requirement
has become a critical problem in the existing

IJSER © 2010
International Journal of Scientific & Engineering Research, Volume 1, Issue 2, November-2010 5
ISSN 2229-5518

smith waterman algorithm. The future work can [17]. "Smith Waterman algorithm" Oct. 4, 2007
target to use the upgraded Smith waterman [18]. Wikipedia, "Smith Waterman algorithm,"
algorithm, that has reduced computational WikipediaH April 2010.
complexity to (N*(M+l)/2) and less size and [20]. Wisegeek, "What is a tumor suppressor gene
space complexity. Moreover, risk level of cancer wisegeek, 2009.
can also be identified in further computational


BRCA1 Breast cancer 1, early onset

BRCA2 Breast cancer 2, early onset
PCR Polymerase Chain Reaction
DNA Deoxyribonucleic acid

[1]. EC Rouchka "Aligning DNA sequencing using
Dynamic Programming",ACM, 2006.
[2]. American Cancer Society, "Breast Cancer
Facts and Figures 2009-2010", American Cancer
Society, 2009.
[3]. American Cancer Society, "What is Breast
Cancer", American Cancer Society, Sep. 18, 2009.
[4]. Baylor college of Medicine HGSC, "Smith
waterman algorithm," Baylor college of Medicine
HGSC, Aug.01, 2002.
[5]. Breast Cancer, "Stages of Breast Cancer",
Breast Cancer, Jan.21, 2010.
[6]. David W Mount, Bioinformatics: Sequence
and genome analysis, 2nd ed, NY: Cold spring
horbor laboratory press, 2000.
[7]. Eugene W. Myers, "An Overview of Sequence
Comparison Algorithms in Molecular Biology,"
Department of Computer Science, The University
of Arizona, Arizona, Tech Rep 91-29, December
[8]. Genetic Home Reference, "BRCA1", Genetic
Home Reference, Aug, 2007
[9]. Genetic Home Reference, "BRCA2", Genetic
Home I Reference, Aug, 2007.
[10]. National Cancer Institute, "BRCA I and BRCA2:
Canarl Risk and Genetic Testing" National Cancer
Institute, May.29, 2009.
[11]. "Overview of steps in DNA Sequencing".
[Online]. .Apr.6 2010.
[12]. P. Sharma et al, "Early detection of breast
cancer base on gene-expression patterns in
peripheral blood cells," Breast cancer research, p.
634+, Jun 2005.
[13]. Ralph Scully, "Role of BRCA gene dysfunction
in breast and ovarian cancer predisposition," Breast
Cancer research, July 2000.
[15]. S. A. de Carvalho Junior," Sequence
Alignment Algorithms," M.S. thesis, King.s College
London, University for London, London, September
[16]. S. Das and D.Dey, "A new algorithm for
localalignment in DNA sequencing, "in IEEE India
National conference, 2004, pp 410-413.

IJSER © 2010