Вы находитесь на странице: 1из 7

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)

ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

EMAIL WITH CLASSIFICATION DETECTION POWER


,
b
Pallavi a , Manpreet Virk
a,b

Department of CSE , GGS Modern Technology, Kharar

ABSTRACT
This research is to classify and filter the large amount of data. The main purpose of this research is to reduce the
error rate of the data and to improve the accuracy. In the previous techniques of classification there may be
some miss classification. But in this research the problem of misclassification is reduced. The work is presented
by this research is some modifications in the classification technique. Therefore, its a good enterprise solution

for filtering. This will optimize the system performance and make some improvements on the previous
algorithm. This will give the better results from the previous one.

I.

INTRODUCTION

Keywords: error rate, Techniques, class labels spam and non-spam.

.O

Email filtering is the processing of email to systematize it according to the exact criteria.
Most often this refers to the automatic processing of incoming messages, but the term is also
used to the involvement of human intelligence in addition to anti-spam techniques. Bayesian
spam filtering is a statistical method of e-mail filtering. Bayesian spam filtering makes use

IJ
TC

for Naive Bayes classifier to make out spam e-mail. Work is classified by Bayesian to
compare the use of tokens i.e typically words, or we can say irregularly other things, with
spam and non-spam e-mails. Bayesian spam filtering is a extremely powerful technique for
constricting with spam, that can adapt itself to the email needs of individual users, and gives
low false positive spam finding rates that are generally acceptable to users.

A. Email Filtering Benefits

Deal with the Service: Because our services are a managed explanation, there are no
additional costs for software/hardware upgrades, Internet bandwidth, or labor for
maintenance. Taking away of these expenses saves costs and eases budgeting by eliminating
capital lay out and surprising expenses that are sometimes incurred through enforced
upgrades of hardware- or software-based solutions. Specialized expertise is also offered on
mail routing, filtering, and blacklists, allowing the staff to concentrate on the infrastructure.

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)


ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

Improves Efficiency: These services recover staff's productivity by eliminating typically 98%
of unwanted email. Current industry estimates designate that as much as 70% of all email is
unwanted, wasting employees time manually filtering and deleting messages. Because we
offer a large array of configuration options, filtering solution can be tailored to needs rather
than changing the way to do business.
Reduces Communications Load: By filtering outside of the premises, one eliminates the
requirement for infrastructure to deal with the email messages that are filtered out, save
Internet bandwidth and server load. This can be facilitated to extend the useful life of assets
by deferring ability issues.

Mitigates Liability: Elimination of nasty content from mail stream completely reduces the
chances of "hostile place of work " lawsuits from employees. Although filtering can never be

100% correct the positive defensive actions of utilizing a managed service demonstrates a
good-faith attempt to protect workers to the highest degree technology permits.

.O

Increases Safety measures: Because filtered email never enters infrastructure, reduce the
exposure to virus and other "malware". Since one act as mail agent, ones own equipment no
longer needs to be registered or usually visible on the Internet, eliminating the risk of hacking
and other malicious actions.

IJ
TC

Avoids Investment: Unlike purchased hardware or software solutions, managed service has
no investment. The only overheads incurred are month-to-month fees, with no reduction, no
capital outlay, and no upholding contracts.

Entirely Compatible: Services are based on standard Internet protocols and will interoperate
with any new Internet mail infrastructure. The risk is avoided of compatibility issues when
replacing or upgrading other components of infrastructure, unlike

solutions based on

software that are only supported with specific mail agents or operating systems.
Improves Reliability: In this the incoming mail is stored during periods of disruption of
local Internet server unavailability and re-delivers when service is again available. This
avoids pointless instances of mail returned to the sender due to local problems. By delivering
outgoing mail, problems will be avoided of server blacklisting due to dynamic addresses on
cable and DSL networks and violence from other subscribers of Internet service provider.
Simply examine the latest anti-spam filtering techniques and hit upon ways how to cut them,
usually done by simply change the message a little. This gave anti spam developers a new
challenge come up with a new anti spam technique; one that was familiar with spammers

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)


ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

tactics as they vary over time, and that is capable to adapt to the particular organization that it
is protecting from spam. There are different emails filtering methods.
1) Blacklist: Blacklist comes under the list based filters. This is spam filtering method
attempts to stop unwanted email by blocking messages from the list of sender. Blacklist
contains the records of email addresses. In this when in coming message arrives, the spam
filter checks to see if its IP or email address is on the blacklist. Then it considers the message
as a spam and then reject it.
2) Whitelist: Whitelist blocks spam using a system almost exactly opposite to that of
blacklist. In this if an unknown senders email address is checked against the database, if they
have no history of spamming, their message is sent to inbox and then they added to the

whitelist.
3) Word based filtering: Word based filtering comes under the content based filtering it is
the simplest form of filtering .word based filtering is the capable technique for fighting junk

email. For example, if the filter has been set to stop all messages containing the word acbd.
But spammers often purposefully misspell keywords in order to evade word based filtering

.O

and this is the main problem in this type of filtering.

4) Bayesian filters: Bayesian filters technique is the most advance content based technique.
It employs the laws of mathematical probability to settle on which message are real and
which message is spam. In this, filter takes words and phrases finding legitimate mails ad

IJ
TC

adds them to the list. This method acquires a training time period before it starts running well.
There are other filtering methods like challenge/response system, collaborative filters.
Bayesian spam filtering is the process of using a naive Bayes classifier to identify spam email. It is depended on the principle that most events are dependent and that the probability
of an event occurring in the future can be inferred from the previous occurrences of that
event. This similar method can be used to classify spam. If some content of text helds often in
spam but not in legitimate mail, then it would be reasonable to predict that this email is
almost certainly spam.

II. LITERATURE REVIEW

Xiaoming JIN, Yuchang LU et al (2003) Index structure that enables efficient similarity
queries in high-dimensional space is crucial for many applications. This paper discusses the
indexing problem in dataset composed of partially clustered data, which exists in number of
applications. Existing index methods are inefficient with incompletely clustered datasets. The
dynamic and adaptive index formation presented here, called a multi-cluster tree (MC-tree),

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)


ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

consists of a set of height-balanced trees for indexing. This index structure improves the
querying efficiency in three ways:
1) Most bounding regions achieve uniform distributions, which results in fewer splits and less
overlap compared with a single indexing tree.
2) The clusters in the dataset are with dynamism detected when the index is updated.
3) The query process does not involve a sequential scan. The MC-tree was shown to be better
than hierarchical and cluster-based indexes for the partially clustered datasets.
This paper presents an index structure for partially clustered datasets which constitute a large

portion of data stored in current information systems. The goal was to make the index
respond efficiently to both clustered and uniform data in one database and to perform queries

in the following ways:

on it without losing precision and recall. This index structure improves the query efficiency

1) Index only the non-clustered data in the main index. It ensures that the main index has

.O

fewer overlaps and splitting compared with a single indexing tree.

2) The clusters in the dataset are dynamically detected when the index is updated, which
ensures the index adaptive and keeps the index from the decrease of performance.

IJ
TC

3) During the query process, each data point is retrieved from a hierarchical index, so
sequential scans are not required. Uniform data and partially clustered data were used to
evaluate the performance of MC-tree. The results verified that MC-tree outperformed the
common hierarchical indexes and cluster-based indexes for the partially clustered dataset.
Hovold Johan (2004) in this research, the use of the naive bayes classifier as the basis for
personalised spam filters is explored. According to this paper, the several machine learning
algorithms are explored already, they were included variants of naive bayes, but in this
proposal the author used word position based attribute vectors, through which very good
results are given when they tested on several publically available corpora.

III. RESULTS

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)


ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

IJ
TC

.O

Fig. 1: GUI of Work

Fig. 2: Scattering of the dataset on the basis of the class labels spam and non-spam

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)


ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

IJ
TC

.O

Fig.3: Classification plotted using Naive Bayes Kernel

Fig.4: Plotting the best choice

IV. CONCLUSION

Email is method of exchanging digital messages from source to destination. The exchange of
messages from an author to one or more. Email messages can be text files, graphics images
and sound files. Email messages are usually encoded in the ASCII text. Spam or unsolicited
e-mail has become a major problem for companies and private users. This paper explored the
6

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)


ISSN-2455-099X,
Volume 1, Issue 1, OCTOBER 2015.

various problems associated with spam and different methods and techniques attempting to
deal with it. From the study we identified that, many of the filtering techniques are based on
text categorization methods and there is no technique can claim to provide an ideal solution
with 0% false positive.

REFERENCES
[1] Han, J., Kamber, M., & Pei, J. (2006). Data mining: concepts and techniques. Morgan kaufmann.
[2] Jiawei, H., &Kamber, M. (2001). Data mining: concepts and techniques. San Francisco, CA, itd: Morgan
Kaufmann, Data, C. H. D. (2010). Data Mining: Concepts and Techniques.
[3] Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., &Spyropoulos, C. D. (2000, July). An experimental

comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in
information retrieval (pp. 160-167). ACM.

[4] Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C. D., &Stamatopoulos, P.
(2000). Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv
preprint cs/0009009.

.O

[5] Basavaraju, M., &Prabhakar, R. (2010). A novel method of spam mail detection using text based clustering
approach. International Journal of Computer Applications, 5(4).

[6] Hovold, J. (2005, July). Naive bayes spam filtering using word-position-based attributes. In Proceedings of
the 2nd Conference on Email and Anti-Spam (CEAS 2005).

IJ
TC

[7] Jin, X., Wang, L., Lu, Y., & Shi, C. (2003). MC-tree: Dynamic index structure for partially clustered multidimensional database. Tsinghua Science and Technology, 8(2), 174-180.
[8] Liu, P. Y., Zhang, L. W., & Zhu, Z. F. (2009). Research on e-mail filtering based on improved Bayesian.
Journal of Computers, 4(3), 271-275.

[9] Rajput, A., &Toshniwal, D. Adaptive Spam Filtering based on Bayesian Algorithm.
[10] Rennie, J. (2000, August). ifile: An application of machine learning to e-mail filtering.In Proc. KDD 2000
Workshop on Text Mining, Boston, MA.

[11] Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998, July). A Bayesian approach to filtering junk
e-mail. In Learning for Text Categorization: Papers from the 1998 workshop (Vol. 62, pp. 98-105).
[12] Song, Y., Kocz, A., & Giles, C. L. (2009). Better Naive Bayes classification for highprecision spam
detection. Software: Practice and Experience, 39(11), 1003-1024.
[13] TIAN Jinlan, ZHANG Suqin, ZHU Lin, LIU Lu. (2005). Improvement and Parallelism of k-Means
Clustering Algorithm. Department of Computer Science and Technology, Tsinghua University, Beijing 100084,
China, 10(3), 277-281.

Вам также может понравиться