A Comparative Study of Use of Shannon, Rényi and Tsallis Entropy For Attribute Selecting in Network Intrusion Detection

A Comparative Study of Use
of Shannon, Rnyi and Tsallis Entropy

for Attribute Selecting in Network Intrusion Detection
Christiane F.L. Lima, Francisco M. de Assis, and Cleonilson Protsio de Souza

1
Federal Institute of Maranho, Department of Education
So Lus, MA - Brazil
2
Federal University of Campina Grande, Department of Electrical Engineering
Campina Grande, PB - Brazil
3
Federal University of Paraba, Department of Electrical Engineering
Joo Pessoa, PB - Brazil
Abstract. Intrusion Detection Systems of computer networks carry out their de-
tection capabilities observing a set of attributes coming from the network traffic.
Such a set may be very large. However, some attributes are irrelevant, redundant
or even noisy, so that their usage may also decrease the detection intrusion effi-
ciency. Therefore, the primary problem of identifying an optimal attribute subset
is the choice of the criterion to evaluate a given attribute subset. In this work, it is
presented an evaluation of Rnyi and Tsallis entropy compared with Shannon en-
tropy in order to obtain an optimal attribute subset which increases the detection
capability to classify the traffic as normal or as suspicious. Additionally, we stud-
ied an ensemble approach that combines the attributes selected by Rnyi, Tsallis
and Shannon information measures. The empirical results demonstrated that by
applying an attribution selection approach based on Rnyi or Tsallis entropies
not only do the number of attributes and processing time are reduced but also the
clustering models can be builded with a better performance (or at least remains
the same) than that built with a complete set of attributes.
Keywords: Attribute selection, network intrusion detection, Shannon, Rnyi and

Tsallis entropy.
1 Introduction
According to [1], a network intrusion is defined as a set of actions that can compromise
the integrity, confidentiality or availability of resources in a network context. Complete
or partial intrusions take place as a result of successful attacks which exploit system
vulnerabilities. Since it is impossible to achieve invulnerable network system, it is more
appropriate to assume that intrusions can happen. Hence, the central challenge with
computer security is determining the difference between normal and potentially harmful
activity.
Alongside other techniques for preventing intrusions such as encryption and fire-
walls, intrusion detection systems (IDSs) are software systems designed with the pur-
pose of identifying and preventing unauthorised use, misuse and abuse of computer
networks and systems.
H. Yin et al. (Eds.): IDEAL 2012, LNCS 7435, pp. 492501, 2012.

c Springer-Verlag Berlin Heidelberg 2012
Shannon, Rnyi and Tsallis Entropy for Attribute Selection 493
In intrusion detection, enormous amounts of data are collected from the network,
generating large log files and raw network traffic, which make human inspection im-
possible. This poses a great challenge. Thus, these activities need to be summarized
into higher-level events, described by some attributes (features). Therefore, selecting
relevant attributes is a crucial activity and requires extensive domain knowledge.
The selection of an optimal reduced subset of attributes is essential in: (i) removing
irrelevant and redundant data; (ii) reducing the use of resources; (iii) increasing detec-
tion precision and (iv) achieving rapid and effective response against attacks. Although
such problems have been tackled by researchers for many years, there has been recently
a renewed interest in feature extraction. Thereby, according to [2], the identification of
a representative set of attributes is a main problem in IDS in order to both optimize the
effectiveness of intrusion detection and decrease the complexity of the IDS.
In this work, we investigated four different approaches to select optimal attributes.
Firstly, we considered a mofified gain ratio that incorporates Rnyi [3] and Tsallis [4]
establishing a comparison with Shannon [5] information measure criteria for construct-
ing C4.5 decision trees [6]. Additionally, we studied an ensemble approach that com-
bines the attributes selected by Rnyi [3], Tsallis [4] and Shannon [5] information mea-
sures. These schemes were applied to a data set for network intrusion detection based
on KDD Cup 1999 data [7].
In order to evaluate their clustering performance on the smaller subsets of attributes
selected using various approaches, we considered different models using two clustering
algorithm: SimpleKMeans [8]and FarthestFirst [9]. The experimental results demon-
strate that the clustering performance of the models built with smaller subsets of at-
tributes is comparable and sometimes better than that associated with the complete set
of attributes for DoS and Probing attack categories.
The remainder of the paper is organized as follows. Section 2 describes the scheme to
select attributes based on C4.5 decision tree algorithm. The experimental environment
is explained in Section 3. Results are reported in Section 4 and Conclusions are drawn
in Section 5.
2 Selection of Attributes Based on Decision Tree Algorithm
Attribute selection is a strategy for data reduction process since irrelevant and redundant
attributes often degrade the performance of algorithms devoted to data characterization,
rule extraction and construction of predictive models, both in speed and in prediction
accuracy. The goal of the attribute selection process is, given a dataset that describes a
target concept using N attributes, to find the minimum number M of relevant attributes
which describe the concept as well as the original set of attributes does, in such a way
that characteristic space is reduced according to some criterion [10].
Attribute selection algorithms can fall into two broad categories: the filter model or
the wrapper model. The filter model tries to choose an attribute subset independently
from the learning algorithm to be used, by examining the intrinsic characteristics of
the data and by estimating the quality of each attribute considering just the avaliable
data. In contrast, the wrapper model evaluates the goodness of the subset of attributes
by applying a predetermined learning algorithm on the selected subset of attributes.
494 C.F.L. Lima, F.M. de Assis, and C.P. de Souza
So, for each new subset of attributes, the wrapper model needs to learn a classifier and
uses its performance to evaluate and determine which subset of attributes are selected.
This approach tends to find attributes best suited to the predetermined classification
algorithm resulting in superior learning performance, but it also tends to be more com-
putationally expensive than the filter model [11].
For N attributes, there are 2N possible subsets. An exhaustive search for an optimal
subset of attributes can be impracticable, especially when N and the number of data
classes increase. Therefore, heuristic methods that explore a reduced search space are
commonly used for attribute subset selection. These methods are typically greedy in the
sense that, while searching through attribute space, they always make what seems to be
the best choice at the time. Their strategy is to make a locally optimal choice in the hope
that this will lead to a globally optimal solution. Such greedy methods are effective in
practice and may come close to estimating an optimal solution [10].
Decision trees (DTs) are originally known to be effective classifiers in a variety of
domains. Most of the decision tree algorithms developed have used a standard top-down
greedy approach to building trees. Decision tree induction is the learning of decision
tree classifiers and it uses the training data, which is described in terms of the attributes.
It constructs a directed graph, where each internal node (non leaf node) denotes the test
on the attribute, a branch represents an outcome of the test and a leaf node corresponds
to a class label (see Figure 1).
In our attribute selection approach, a decision tree induction is used for selecting
relevant attributes. All attributes that do not appear in the tree are assumed to be irrel-
evant. So, the set of attributes appearing in the tree represents the subset of selected
attributes. 1.
IInitial attribute set:

A3 { 1, A2, A3, A4, A5, A6, A7, A8, A9, A10 }
{A
A6 A2
c1 c2 A10 c3 c4
Selected attribute subset:

{A2, A3, A6, A10 }
c1 c3
Fig. 1. Decision tree induction for attribute selection
The most popular DTAs are the ID3 (Induction of Decision Tree) [12] and its succes-
sor, the C4.5 [6] algorithm. Using a top-down process, both algorithms are capable of
building decision trees by selecting appropriate attribute for each decision node based
on Shannons entropy measure [5]. For each iteration, the best attribute is that one with
highest mutual information [5] among all others. This basic criterion is used in ID3
algorithm to select attributes while the tree is being designed. However, although pre-
senting good results, it has a strong bias in favor of attributes with many values. To
solve this problem, Quinlan [6] proposed in the C4.5 algorithm a kind of normalization,
called gain ratio, in which the apparent gain assigned to attributes with many values
is adjusted. For more details on the general algorithm for building C4.5 decision trees
based on Shannon, Rnyi and Tsallis entropies, see paper [13].
In this context, there are other entropy measures, such as Rnyi and Tsallis entropies,
that could be applied to select the attribute subset. Thus, we propose to investigate
whether these measures can be adequately used in this problem. In the following, each
entropy formulation is duly described.
The motivation to apply these entropies rises from the remark that for > 1 more
frequent events are emphasized [14] and the limitation of the size of data set from which
one has to capture the more adequated tree atributes.
2.1 Shannon Entropy

The concept of entropy is related with the amount of information into a message as a
statistical measure. Based on the work of Shannon [5], given a class random variable C

with a discrete probability distribution {pi = Pr[C = ci ]}ki=1 , ki=1 pi = 1 where ci
is the ith class. Then entropy H(C) is an expected amount of information needed for
class prediction, defined as
k

H(C) = pi log pi . (1)
i=1
There are N attributes each denoted Ai , , i = 1, 2, . . . N . In turn, each attribute Ai has
vi values which it can assume; vi is finite. Shannon defined another basic concept in in-
formation theory with respect to the idea of dependence between two random variables
C and Ai , which is called mutual information I(C; Ai ), and can be expressed in terms
of Shannon entropies as follows
I(C; Ai ) = H(C) H(C|Ai ), (2)
where H(C|Ai ) stands for the conditional entropy of C given Ai . The mutual infor-
mation is interpreted as the amount of uncertainty in C which is removed by knowing
Ai .
Other entropies measures have been proposed as, for instance, Rnyi entropy [3] and
Tsallis entropy [4]. Rnyi and Tsallis entropies contain additional parameter which
can be used to make them more or less sensitive to the shape of probability distributions.
2.2 Rnyi Entropy

Rnyis entropy constitutes a measure of information of order , having Shannons
entropy as limit case, and is defined by the following expression:
k
1
R (C) = log p
i , 0, = 1 (3)
1 i=1

where ki=1 pi = 1 and lim1 R (C) = H(C).
Using Rnyi entropy of order (0, 1), the mutual information can be generalized
as follows:
I (C; Ai ) = R (C) R (C|Ai ) (4)
2.3 Tsallis Entropy
Another generalized entropy, defined by Constantino Tsallis [4], is given by:

k

1

S (C) = 1 pi (5)
1 i=1
where 0 and lim1 S (A) = H(A).

For > 1 , Tsallis mutual information is defined as [15]:
I (C; Ai ) = S (C) S (C|Ai ) (6)
Using Shannon entropy, events with high or low probability do not have different
weights in the entropy computation. However, using Tsallis entropy, for > 1, events
with high probability contribute more than low probabilities ones for the entropy value.
Hence, the higher is the value of , the higher is the contribution of high-probability
events for the final result. Furthermore, increasing of coefficient ( ), Rnyi en-
tropy is increasingly determined by events with higher probabilities, and lower values of
coefficient ( 0) weigh the events more equally, regardless of their probabilities.
2.4 Proposed Attribute Selection Schemes
Based on a given training set, to build a decision tree using C4.5 algorithm, first the
mutual information (I(C; Ai )), and gain ratio ( I(C;A i)
H(Ai ) ) are calculated for all attributes.
After, the attribute that yields the highest decrease of uncertainty about prediction of
classes at that node is selected. The selection of attributes is repeated recursively until
decision tree to be completely designed.
In this work, four different approaches to select a subset key attributes to identify four
attacks categories are used, that fit the filter model. To do so, it was applied individually,
gain ratio based on Rnyi [3] and Tsallis [4] information measures compared with
Shannon [5] information measure criteria for constructing C4.5 decision trees [6] and
an ensemble approach in order to choose optimal attribute subset that can be used to
find more efficient alternatives to increase the capability of IDS. The attribute selection
schemes are shown in Figure 2.
3 Simulation Environment
For simulation of attributes selection schemes and to build all clustering models, WEKA
toolkit(Waikato Environment for Knowledge Analysis) [16] has been used. In WEKA
toolkit, the source code of the class J48 for generating standard C4.5 decision tree was
Shannon entropy Subset of

measures attributes
Connection records Rnyi entropy Subset of Ensemble
(41 attributes) measures attributes Subset of attributes
Tsallis entropy Subset of

measures attributes
Fig. 2. Attribute selection schemes
modified by the authors, replacing the Shannon entropy by the Rnyi entropy and Tsallis
entropy, depending on the value [13]. We used java programming for implementation.
In proposals evaluating of IDSs, it is usually used as benchmark the intrusion datasets
available in the Knowledge Discovery and Data Mining Competition - KDD Cup 99 [7]
for both training and tests of propositions. This dataset is currently used by researchers
because it still has the capability to allow researchers to compare different intrusion
detection techniques on a common dataset. Each network connection (or instance) in
the KDD CUP 99 data set contains 41 attributes [7]. Also, each instance is labeled
either as normal or as an attack-specified type. These attacks are of 22 different types
falling into four main categories:
DOS - denial-of-service attacks.

Probing - when an attacker scans a network to obtain information or seeks vulner-
abilities.
Remote to Local (R2L) - when a remote-machine user tries to get access to a local
server.
User to Root (U2R) - when an authorized user tries to get access as superuser
(root).
Usually, a subset of network traffic is necessary to be collected in advance for design-

ing intrusion detection systems. However, it is difficult to collect all attack information
because in real world intruders constantly develop new attack codes to exploit security
vulnerabilities of organizations. The collected data always encloses uncertainty when
only limited information about intrusive activities is available. Accordingly, in order
to simulate the problem of uncertainty existing in the KDD99 data set and to decrease
computational cost without compromises the research results, a subset of the individ-
ual category of attack was randomly selected from the used intrusion datasets. As in
Table 1, each category contains instances corresponding to certain attacks or normal
behavior.
4 Experimental Results and Analisys
Considering the better results obtained in experiments, performed by the authors [13],
for building decision trees based on Shannon [5], Rnyi [3] and Tsallis [4] entropies,
in terms of classification accuracy and size of the tree, we chose the best decision trees
designed to be analyzed. For example, for DoS category, we selected the decision tree
Table 1. Attacks per Category
DoS PROBING R2L U2R

back (1026) ipsweep (586) ftp-write (8) buffer-overflow(21)
land (11) nmap (151) guess-passwd(53) loadmodule (10)
neptune(10401) portsweep (155) imap (11) perl (3)
pod (69) satan (16) multihop (11) rootkit (7)
smurf (7669) spy (4) phf (5) normal (1676)
teardrop (15) normal (1704) wareszclient (60)
normal (2573) warezmaster (20)
normal (1934)
Table 2. Selected attributes by Shannon, Rnyi and Tsallis information measures and ensemble
approach
Attacks Measures Selected Attributes

Shannon 2, 5, 7, 8, 23, 34, 36, 39
DoS Rnyi 2, 5, 7, 8, 23, 32, 35, 36, 39
Tsallis 2, 5, 7, 8, 23, 26, 34, 39
Ensemble approach 2, 5, 7, 8, 23, 26, 32, 34, 35, 36, 39
Shannon 1, 2, 4, 5, 6, 23, 30, 33, 37, 38, 40
Probing Rnyi 1, 2, 5, 6, 25, 30, 32, 33, 37, 38, 40
Tsallis 1, 2, 4, 6, 23, 30, 31, 33, 37, 38, 40
Ensemble approach 1, 2 4, 5, 6, 23, 25, 30, 31, 32, 33, 37, 38, 40
Shannon 1, 3, 5, 6, 9, 10, 11, 17, 19, 22, 32, 33, 35
R2L Rnyi 2, 5, 6, 10, 11, 12, 19, 33, 35, 37, 38, 39
Tsallis 1, 3, 5, 6, 10, 11, 17, 19, 22, 37, 38
Ensemble approach 1, 2 , 3, 5, 6, 9, 10, 11, 12, 17, 19, 22, 32, 33, 35, 37, 38, 39
Shannon 13, 16, 17, 18, 32, 33
U2R Rnyi 13, 18, 32, 33, 36
Tsallis 13, 16, 18, 32, 33
Ensemble approach 13, 16, 17, 18, 32, 33, 36
designed by Rnyi entropy with = 0.5, and the best decision tree builded using Tsallis
entropy with = 1.2.
After to select the decision trees, individually, a subset of attributes was selected for
each dataset according to individual category of attacks. Moreover, a new attributes sub-
set were selected based on ensemble approach, using the subsets of attributes extracted
by information measures of Shannon [5], Rnyi [3] and Tsallis [4].
Since different types of attack have their own patterns, different categories of attacks
may have different optimal subsets of attribute. So, four experiments were conducted
to evaluate which attribute subset is more suitable for detecting individual category of
attacks according to the respective entropy used. The experiments results are seen in
Table 2.
In the experiments, the subsets of selected attributes are then used into SimpleK-
Means [8] and FarthestFirst [9] algorithms for data clustering, using Weka toolkit.
Cluster analysis is under strong development. Contributing areas of research include
statistics, machine learning, data mining, pattern recognition, and image process-
ing [17].
Table 3. Experimental Result
Attacks Method Using 41 attributes Using Shannon Using Rnyi Using Tsallis Using ensemble approach
DAR(%) AUC DAR(%) AUC DAR(%) AUC DAR(%) AUC DAR(%) AUC
DoS FF 90.28 0.6497 91.16 0.6412 88.02 0.6972 91.05 0.6223 91.12 0.8074
SKM 66.61 0.8578 68.59 0.9515 68.6 0.9511 68.59 0.9524 67.95 0.7854
Probing FF 52.18 0.4648 54.2 0.504 57.45 0.5478 48.74 0.6313 47.25 0.4578
SKM 61.28 0.7275 57.99 0.6981 52.68 0.6562 70.07 0.7653 68.77 0.7644
R2L FF 98.53 0.9754 95.34 0.9675 96.43 0.9719 96.67 0.8983 96.77 0.966
SKM 48.86 0.7355 36.16 0.667 47.29 0.7309 42.96 0.6988 36.16 0.667
U2R FF 98.89 0.8769 95.57 0.9815 81.36 0.8724 94.41 0.9755 82.18 0.8305
SKM 51.49 0.7575 43.91 0.7187 48.46 0.7434 43.91 0.7187 48.46 0.7434
FF = FarthestFirst, SKM = SimpleKmeans
The WEKA SimpleKMeans algorithm, its implementation of the k-means algo-

rithm [8], uses Euclidean distance measure to compute distances between instances
and clusters. The WEKA FarthestFirst algorithm provides the Farthest First Traver-
sal Algorithm by Hochbaum and Shmoys [9], which works as a fast simple approxi-
mate clusterer modeled after simple k-means. FatherestFirst is a variant of k-means that
places each cluster center in turn at the point farthest from the existing cluster centers.
This point must lie within the data area.
For training and testing of SimpleKMeans and FarthestFirst algorithms, first is ap-
plied on all 41 attributes and the results of clustering are calculated. After that training
and test is done with reduced subset attributes and the results of clustering is calculated.
To evaluate the effectiveness of the selected attributes, the detection results using the
selected attributes were compared with the results using all the 41 attributes based on
the same test data.
In the mode classes to clusters evaluation, used in this work, Weka evaluates the
clusterings in two steps: first ignores the class attribute and generates the clustering.
Then during the test phase it assigns classes to the clusters, based on the majority value
of the class attribute within each cluster. Then it computes the classification error, based
on this assignment and also shows the corresponding confusion matrix. The criteria
for evaluation was the detection accuracy (DAR) and the Area Under ROC (Receiver
Operating Characteristic) Curve (AUC) [18]. The result is showed in Table 3.
In the experimental analysis on the attribute selection scheme performance, the re-
sults are significantly different if the difference is statistically significant at the 1%
level. Furthermore, performance varies depending on both the clustering algorithm and
the performance metric is used to evaluate models.
Compared with using all the 41 attributes (see Table 3), it can be said that among
the four attribute selection techniques, the subset of key attributes selected by Tsallis
entropy performs better than the other three techniques in terms of DAR and AUC when
models are built using the Dos and Probing data sets and SimpleKMeans algorithm.
Selecting attributes based on ensemble approach achieve better results in terms of DAR
and AUC when models are built using the DoS data set and FarthestFirst algorithm.
There were no significant differences in terms of DAR and AUC among Shannon, Rnyi
and Tsallis attribute selection techniques when models are built using the DoS data set
and SimpleKMeans algorithm.
Based on Table 3, the preliminary empirical results point out that when one attribute
selection scheme performed best in terms of one performance metric, this may not be
true when other performance metric is used to evaluate models. For example, Rnyi
performed best on performance metric DAR, Tsallis performed best in terms of AUC
performance metric when models are built using the Probing data set and FarthestFirst
algorithm. Another obtained result in this case is that Rnyi performed better than com-
plete data set in terms of AUC.
From the Table 3, the detection results reported by the research indicate that the clus-
tering performance in terms of both DAR and AUC for SimpleKMeans and Farthest-
First algorithms on the complete data set (with 41 attributes) significantly outperforms
those on the attribute subsets selected by any attribute selection scheme for R2L and
U2R attacks categories. This is consistent with Sabhnani and Serpen [19]. In this paper
[19], the autors investigated the deficiencies of KDD 99 intrusion detection datasets and
concluded that it is not possible to achieve a high level of detection rate on R2L and
U2R attacks categories, involving content.
In particular, Shannon and Rnyi entropies for feature selection do not bring any im-
provement in performance (DAR and AUC) for SimpleKMeans algorithm on Probing
dataset (see Table 3).
Excluding the complete attribute set, we can summarize the following facts:
Selecting attributes based on Rnyi entropy performs better than the other three
techniques in terms of DAR and AUC when models are built using the R2L data
set and both SimpleKMeans and FarthestFirst algorithms.
Rnyi entropy technique and ensamble approach perform better than Shannon and
Tsallis entropies techniques in terms of DAR and AUC for SimpleKMeans algo-
rithm on U2R dataset.
Shannon entropy technique is better than the other three techniques in terms of
DAR and AUC for SimpleKMeans algorithm on U2R dataset.
Another obtained result, compared with Shannon entropy, is that using Tsallis and
Rnyi entropies, it was achieved the same size or smaller set of attributes to detect
attacks for all attacks categories and using Rnyi entropy, it was achieved the same size
or smaller set of attributes for Probing, R2L and U2R attacks categories.
5 Conclusion
In this paper, it is presented an evaluation of Rnyi and Tsallis entropy compared with
Shannon entropy and their applications to intrusion detection system. Additionally, we
studied an ensemble approach that combines the attributes selected by Rnyi, Tsallis
and Shannon information measures. The experimental results shows that in general,
selecting attributes based on Rnyi and Tsallis entropies can to achieve better results,
compared with Shannon entropy and ensemble approach considering an individual data
set and the clustering algorithm employed, since one works better than other. Moreover,
attribution selection approach based on Rnyi or Tsallis entropies reduce the number
of attributes and processing time. For future research, it will be used more detailed
attributes from real network traffic that supposedly are able to better characterize packet
contents as well as header data.
Acknowledgment. The authors would like to thank Brazilian Coordination for Im-
provement of Higher Education Personal (CAPES), National Council for Scientific
and Technological Development (CNPq) and State Research Supporting Foundation
of Maranho (FAPEMA).
References
1. Crosbie, M., Spafford, E.: Defending a computer system using autonomous agents. Depart-
ment of Computer Sciences, Purdue University, CSD-TR-95-022; Coast TR 95-02 (1995)
2. Estvez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M.: Normalized mutual information feature
selection. IEEE Tran. on Neural Networks 20(2), 189201 (2009)
3. Rnyi, A.: On measures of entropy and information. In: Proc. the 4th Berkeley Symposium
on Math. Statistics and Prob., pp. 547561. Univ. of California Press, Berkeley (1960)
4. Tsallis, C.: Possible generalization of boltzmann-gibbs statistics. Journal of Statistical
Physics 52(1-2), 479487 (1988)
5. Shannon, C.E.: A mathematical theory of communication. Bell Systems Technical Jour-
nal 27, 623656 (1948)
6. Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, San
Diego (1993)
7. Kdd cup 99 intrusion detection data set (retrieved March 01, 2010),
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
8. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations.
In: Le Cam, L.M., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical
Statistics and Probability, University of California Press, vol. 1, pp. 281297 (1967)
9. Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Mathe-
matics of Operations Research 10(2), 180184 (1985)
10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann
Publishers Inc., San Francisco (2006)
11. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and cluste-
ring. IEEE Tran. on Knowledge and Data Engineering 17, 491502 (2005)
12. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81106 (1986)
13. Lima, C.F.L., de Assis, F.M., Souza, C.P.: Decision tree based on shannon, renyi and tsal-
lis entropies for intrusion tolerant systems. In: Fifth International Conference on Internet
Monitoring and Protection, pp. 117122 (May 2010)
14. Tsallis, C.: Nonextensive statistics: Theoretical, experimental and computational evidences
and connections. Brazilian Journal of Physics 29, 135 (1999)
15. Furuichi, S.: Information theoretical properties of tsallis entropies. Journal of Mathematical
Physics 47(2) (2006), http://link.aip.org/link/?JMP/47/023302/1
16. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with
Java Implementations, 2nd edn. Morgan Kaufmann Publishers, California (2005)
17. Kaufman, L., Rousseeuw, P.: Finding Groups in Data An Introduction to Cluster Analysis.
Wiley Interscience, New York (1990)
18. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861874
(2006), http://dx.doi.org/10.1016/j.patrec.2005.10.010
19. Sabhnani, M., Serpen, G.: Why machine learning algorithms fail in misuse detection on kdd
intrusion detection data set. Intell. Data Anal. 8, 403415 (2004)

A Comparative Study of Use of Shannon, Rényi and Tsallis Entropy For Attribute Selecting in Network Intrusion Detection

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Comparative Study of Use of Shannon, Rényi and Tsallis Entropy For Attribute Selecting in Network Intrusion Detection

Загружено:

Авторское право:

Доступные форматы

A Comparative Study of Use

of Shannon, Rnyi and Tsallis Entropy

Christiane F.L. Lima, Francisco M. de Assis, and Cleonilson Protsio de Souza

Keywords: Attribute selection, network intrusion detection, Shannon, Rnyi and

2 Selection of Attributes Based on Decision Tree Algorithm

IInitial attribute set:

Selected attribute subset:

Fig. 1. Decision tree induction for attribute selection

2.1 Shannon Entropy

2.2 Rnyi Entropy

2.3 Tsallis Entropy

Another generalized entropy, defined by Constantino Tsallis [4], is given by:

where 0 and lim1 S (A) = H(A).

I (C; Ai ) = S (C) S (C|Ai ) (6)

2.4 Proposed Attribute Selection Schemes

Shannon entropy Subset of

Tsallis entropy Subset of

Fig. 2. Attribute selection schemes

DOS - denial-of-service attacks.

Usually, a subset of network traffic is necessary to be collected in advance for design-

4 Experimental Results and Analisys

Table 1. Attacks per Category

DoS PROBING R2L U2R

Attacks Measures Selected Attributes

Table 3. Experimental Result

The WEKA SimpleKMeans algorithm, its implementation of the k-means algo-

Вам также может понравиться