Вы находитесь на странице: 1из 4

1.2.

One Phase Algorithms


The One Phase Algorithms immediately calculate the utility of each pattern that is to be considered in the
search space and thus, do not require to generate and store candidate itemsets in memory. In addition to
this, one- phase algorithms also make use of novel upper-bounds on the utility of itemsets. These upper-
bounds are based on the exact utility of each itemset and thus works better than the TWU measure to
prune the search space. These upper-bounds include the remaining utility and newer measures like local
utility and sub-tree utility.

The FHM Algorithm

This algorithm uses utility-list structure. The utility- list of an itemset stores list of transactions (like
tid-list structure) where it appears, and its utility as well as the utility of the remaining items in each
of these transactions.

In FHM, a depth-first search is performed to explore the search space of itemsets, and a utility-list is
created for each of these visited itemsets.

Figure 1: Utility List example

Now, assuming a total order is defined (eg. alphabetic order in this case) on the set of items, utility-
lists of k-itemsets (k>1) can be quickly created by joining utility lists of shorter patterns.

The FHM algorithm scans the database once to create the utility-lists of 1-itemset. Then, the utility-
lists of larger itemsets are constructed by joining the utility-lists of smaller itemsets.

Utility-list of an itemset X can be used to calculate:

 Utility of X: Sum of iutil values in the list


 Remaining utility upper-bound of X: Sum of iutil and rutil values in the list

Thus, we get following rule for pruning the search space:

If remaining utility upper-bound of itemset X < minutil, X and its extensions are low utility itemsets
The FHM Algorithm

The main procedure of FHM scans to calculate and identify the set I* of all items having a TWU no
less than minutil. A total order on these items, in the order of ascending TWU values is established
and items are reordered accordingly in each transaction. Now, the utility-list of each item i in I* is
built and a structure called EUCS (Estimated Utility Co-Occurrence Structure) is built. EUCS is defined
as a set of triples of the form (a,b,c) ϵ I* × I* × R, such that TWU({a,b}) = c.

A depth-first search exploration of itemsets, then begins by calling the recursive procedure
FHMSearch with empty itemset ∅, the set of single items I*, minutil and EUCS structure.

The FHMSearch Procedure


The Construct Procedure

The FHMSearch procedure takes as input (1) an itemset P, (2) set of extensions of P having the form
Pz, obtained by appending an item z to P, (3) minutil and (4) the EUCS.

In this procedure, for each extension Px of P:

 If Utility of Px is no less than minutil, then Px is a high-utility itemset


 If Remaining utility upper-bound of Px is no less than minutil, it means extensions of Px
should be explored.

This exploration is performed by merging Px with all extensions Py of P such that y  x (total
order), to form extensions of the form Pxy containing |Px|+1 items.

The utility-list of Pxy is then constructed by calling Construct procedure to join the utility lists of
P, Px and Py.

Again, a recursive call to the Search procedure with Pxy is done to calculate its utility and explore its
extensions. The FHMSearch recursively explores the search space of itemsets by appending single
items and prunes based on Remaining utility upper-bound rule.

Drawbacks: Though this algorithm is easy to implement and faster than two-phase algorithms, it has
following drawbacks:

 The algorithm explores all possible itemsets by combining different itemsets, some of which
may not even appear in any transaction.
 It takes a lot of time and space to build utility-list for each visited itemset

Pattern-Growth One-Phase Algorithms

These algorithms address the drawbacks of utility-list based algorithms. Here, only those itemsets are
considered which appear in at least one transaction in the database.

 d2HUP algorithm performs depth-first search, and represents the database and projected
databases using an hyper-structure.
 EFIM algorithm performs depth-first search using a horizontal database representation,
introduces novel upper-bound called local-utility and subtree-utility, a novel utility counting
technique Fast Utility Counting and integrates efficient database projection and transaction
merging techniques named High-utility Database Projection (HDP) and High-utility
Transaction Merging (HTM). All these make EFIM much faster while often having lower
memory consumption.

Drawbacks: For complex applications, extensions of the problem of high utility itemset mining is
required to address limitations like, large number of patterns generated according to minutil,
negative utility values in real-life applications, being able to find correlation between items in an
itemset, discover recurring transactions for a user, not taking into account dynamicity of
database, etc.

Вам также может понравиться