Академический Документы
Профессиональный Документы
Культура Документы
select balance from Account where For example, consider the following update
Account_Number='9002'; statements in one transaction.
update Account set balance=balance-900 where Update Table1 set x = a + b + c where a=50;
Account_Number='9001' ;
Update Table1 set y = x + u where x=60;
update Account set balance=balance+900 where
Account_Number='9002' ; Update Table1 set z = x + w + v where w=80;
commit; //if all SQL queries succeed Using the above example, it can be noted that
<W(x), W(y),W(z)>
rollback; //if any of SQL queries failed or error
is one write sequence of data item x, that is <W(x),
The query corresponding to this transaction is: W(y),W(z)> ∈
where O represents the operations i.e. Read or For all sequential patterns <R(x1), R(x2), …, R(Xn-1),
write Operations. O ∈ {R, W}. The Read sequence O(xn) > in read sequence set, generate the read
represents that the transaction may need to read rules with the format {R(x1), R(x2) ...} ⇒ O(xn). If the
all data items x1, x2, …, xn-1 before the transaction confidence of the rule is larger than the minimum
performs operation (O∈ {R, W}) on data item xn. confidence (Ψconf), then it’s added to the answer
set of read rules, which implies that before xn, we
For example, consider the following update need to read x1,x2…….. xn-1
statement in a transaction.
For example:
Update Table1 set x = a + b + c where d = 90;
The Read Rule corresponding to the read sequence
In this statement, before updating x, values of a, b, <R(a), R(b),
c and d must
R(c), R(d), W(x)> is:
be read and then the new value of x is calculated.
So <R(a), R(b), {R(a), R(b), R(c), R(d)} ⇒ W(x)
R(c), R(d), W(x)> ∈ RS(x). Definition 6 (Write Rules (WR)) Write rules are the
association rules generated from write sequences
Definition 4 (Write Sequence) A write sequence is whose confidence is greater than the user defined
defined as threshold (Ψconf). A write rule is represented as
items x1, x2, …, xk must be updated by the same The query patterns as perceived by our model
transaction. QPAFCS are explored using DAEs that represent
the first level of access of the CDEs. A user's
For Example: The write rule corresponding to the behaviour is represented by a set of first-order
write sequence statements (derived from queries) called attribute
hierarchy encoded in first-order logic, which
<W(x), W(y),W(z)> is W(x) ⇒ {W(y),W(z)}
defines abstraction, decomposition and functional
Definition 7 (Critical Data Elements (CDE)) They relationships between types of access
are semantically defined data elements crucial to arrangements. The unit-transactions accessing
the functioning of the system. They are the data CDEs are decomposed into attribute hierarchy
attributes of prime significance having direct comprising of DAEs, which further represents the
correlation to the integrity of the system. In a user’s most sensitive retrieval pattern.
vertically hierarchical organisation, these are the
Example:
attributes accessed only by the top level
management, and the access by lower levels of R(b) → R(a)
hierarchy is strictly protected. R(b), R(c) → R(a)
If a is a CDE, then the set {b,c} represents DAEs.
Type of Attribute Sensitivity Level
Definition 10 (Dubiety Score(φ)) A measure of
Critical data Elements Highest
anomaly exhibited by a user in the past based on
Directly Associated Medium his historic transactional data. This score
Elements summarizes the user’s historic malicious access
attempts. Dubiety Score attempts to quantify the
Normal Attributes Low personnel vulnerability that the organisation faces
because of a particular user.
Table 3.1 Types of attributes and their sensitivity
levels Dubiety Score is indicative of the amount of
deviation between the user’s access pattern and
CDEs are tokens of behaviour that our model uses his designated role. Dubiety Score combined with
for the malicious activity recognition of users of the deviation of user’s present query from his
system. normal behaviour pattern, yields the output of the
proposed IDS.
Definition 8 (Critical Rules (CR)) A set of rules that
contain a Critical Data Element in its antecedent or For our paper:
consequent.
0<=φ<=1.
CR = {ζ | (ζ ∈ RR ∨ ζ ∈ WR) ∩ (x ∈ CDE ∩ ({R(x1),
R(x2) …} ⇒ O(x) ∪ O(x) ⇒ {W(x1), W(x2) …}))} Higher the Dubiety Score, more is the evidence
against user following the assigned role, that is
We propose a method of user Access Pattern more is the malicious intent i.e. rogue behaviour.
Recognition using the Critical Rules. CR recognize
the actions and goals of Users from a series of Definition 11 (Dubiety Table) A table maintaining
observations on the users' actions and the the record of dubiety scores of each user. It
environmental conditions, i.e. the user query contains two attributes: UserID and Dubiety Score.
pattern associated to the Critical data elements.
The initial Dubiety scores are set to 1. aims at generating user-profiles from the
transaction-logs and quantifies deviation from
Uid φ normal behaviour i.e. this phase aims to recognise
and characterise the user activity pattern on the
1001 1
basis of their queries arrangement. The following
1002 1 are various components of architecture of the
proposed model:
1003 1
1004 1
1005 1
For example:
Where:
The parser generates a unique Transaction ID say Table 3.4 Rule Generator for given Example
T1234 followed by parsing the transaction. The
parser finally yields: DAE generator: In our approach, we
semantically define a class of data items known as
< T1234,U1001,<R(Account_number),R(balance)>> Critical data elements or CDEs. These CDEs and
rules are given as input to our DAE (Directly
Frequent sequences generator: After the SQL associated element) generator which specifies all
query parser generates the sequences, the those elements as DAE which are present in either
generated sequences are pre-processed. Then the antecedent or the consequent of those rules
weights are assigned to data items, for instance that involve at least one of the CDEs.
the CDEs are given greater weight as compared to
DAEs and other normal attributes. Then finally
Algorithm 1: DAE Generator
these pre-processed sequences are given as inputs
Data: CDE, Set DAE = {}, RR = Set of Read
to frequent sequences generator. It uses the prefix
Rules, WR = Set of Write Rules
span algorithm to generate frequent sequences
Result: The set of Directly Associated
out of input sequences corresponding to each UID.
elements DAE
Rule generator: The frequent sequences are Function: DAE Generator (CDE, RR, WR)
given as inputs to the rule generator module which for Ω є RR ∪ WR do
uses association rule mining to generate read rules for α є Ω do
and write rules out of the frequent sequences. if α є CDE
while β є Ω do
As an example, if the input frequent sequences
DAE {} ⃪ β
are:
end
1. <R(m),R(n),R(o),W(a)> end
2. <R(m),R(n),W(o),W(a)> end
3. <R(m),W(n),W(o),W(a)> end
4. <W(a),R(b),W(o)>
User vector generator: Using the frequent
5. <R(a),R(b),R(m),W(a)>
6. <R(a),R(b),W(m),W(b)> sequences for the given audit period, it generates
the user vectors. A user vector is of the form
BID = < UID, w1, w2, w3, ... wn > The centre of a cluster (α) is the mean of all points,
weighted by their membership coefficients.
where wi = |O(ai)|. Mathematically,
Each of these wi would represent how frequently a The objective function that is minimized to create
user performs the operation on the particular data clusters is defined as:
item. It also can be used in a normalized form, as is
𝑛 𝐶
used in our proposed model QPAFCS.
𝑎𝑟𝑔 𝑚𝑖𝑛 ∑ ∑ 𝑤𝑖𝑗𝑚 ||𝑢𝑖 − 𝛼𝑗 ||2
𝑖=1 𝑗=1
UVID = <UID, < p(a1), p(a2), p(a3), … p(an)>>
where, where
Inputs Outputs
Table 3.5 User profile for the given Example a. If read operation has been performed on
any CDE, i.e. r(CDE) is present in the rule and
Consider a system with 4 fuzzy clusters and 4 UV[i][r(CDE)] = 0 and UV[i][w(CDE)] = 0 for
attributes, the given table illustrates the profile of the given user, then the transaction is
user U1001. termed as malicious.
b. If write operation has been performed on
3.3 Testing Phase any CDE i.e. w(CDE) is encountered and
UV[i][w(CDE)] = 0 for the given user, then
In section 3.2, the learning phase is described, in
the transaction is termed as malicious.
which the system is trained using non-malicious or
benign transactions. Now the trained model can
be used to detect malicious transactions. In this
phase, a test query is obtained as input and it is
compared with the model’s perception of user’s
access pattern, and the model perpetually
evaluates if the test transaction is malicious. It is
first checked whether the user is trying to access a
CDE. If yes, the transaction is allowed only if the
given user has accessed that CDE before. Next, it is
checked if any DAE is being accessed. A user can
perform write operation on a DAE iff it is
previously written by the same user, otherwise the
transaction is termed as malicious. Next, we check
if the transaction abides by the rules that are
generally followed by similar users.
DAE Detector: This module addresses the issue of
inference attacks on CDEs. As discussed earlier,
certain data elements can be used to access the
CDEs, i.e. first order inference. This module uses
the rules mined in the learning phase to determine
which elements can be used to directly infer the
DAEs.
1005 0.7
1001 0.42
4. Discussion
1002 0.49
With regard to a typical credit card company
1003 0.2 dataset, some examples of critical data elements
(CDEs) are: -
1004 0.6
1. CVV (denoted by a)
1005 0.46
Card verification value (CVV) is a combination of
Table 3.8 calculated dubiety scores table features used in credit, debit and automated teller
machine (ATM) cards for the purpose of
establishing the owner's identity and minimizing
the risk of fraud. The CVV is also known as the card
verification code (CVC) or card security code (CSC).
These are the attributes that have been collected Setting our hyperparameter ФUT as 0.65. We
for the fraud detection and are not directly used to observe that φf > ФUT. Hence the test transaction is
access the CDE but are crucial for the process. malicious, and an alarm is raised.
R(b) → R(a)
R(b), R(c) → Ra)
1. JC Distance