You are on page 1of 8

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.


Introduction

The goals I am trying to achieve with this assignment are mainly to gain new insights of what can be done with Data Mining,
Machine Learning and Process Mining tools, how can I use them for my own projects and how can I apply the concepts
learned in this Process Mining course.

Part 1. Discovery of Process Models

We are interested to discover how instances of this process are actually being executed. However, the process clearly
shows much variability and the activities that are actually executed depend on the final outcome. Thus, it is not possible to
discover a single process model. Therefore, I am going to discover different process models: the Applications Lifecycle,
the Offers Lifecycle and the Combined Lifecycle.

For each of them, I start by applying the Filter Log on Simple Heuristics plugin in order to remove the incomplete activities;
these are the activities with a status different from COMPLETE.

I. The Model of the Applications Lifecycle

To begin with the analysis, I apply once again the Filter Log on Simple Heuristics plugin to select the events with name
starting with A. Then, I apply the Mine with Inductive Visual Miner plugin. The resulting process model with 80% of the
possible paths and the corresponding deviations is shown below:

II. The Model of the Offers Lifecycle

For this model, I apply once again the Filter Log on Simple Heuristics plugin to select the events with name starting with O.
Then, I apply the Mine with Inductive Visual Miner plugin. The resulting process model with 80% of the possible paths and
the corresponding deviations is shown below:

III. Combined Models

Due to the high variability, I will discover one different model for each possible outcome, namely whether the application is
finally Declined, Cancelled or Approved. In these cases, the processing steps to generate the models starts with executing
the Filter Log on Simple Heuristics to select just the events with names starting with A or with O.

1. Model with Declined Outcome

To get the combined model with declined application outcome, in the End events screen of the Filter Log on Simple
Heuristics plugin, I select just the activity A_DECLINED. Finally I apply the Mine with Inductive Visual Miner plugin.
The resulting process model with all possible activities and 85% of the possible paths with the deviations is:
2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.
2. Model with Cancelled Outcome

For the case of the combined model with declined application outcome, in the End events screen of the Filter Log
on Simple Heuristics plugin, I select just the activity A_CANCELED. Finally I apply the Mine with Inductive Visual
Miner plugin and the resulting process model with all possible activities and 68% of the possible paths with the
corresponding deviations, is shown below:

3. Model with Approved Outcome

For this case, in the End events screen of the Filter Log on Simple Heuristics plugin, I select the activities
A_REGISTERED, A_PPROVED and A_ACTIVATED. Finally I apply the Mine with Inductive Visual Miner plugin
and the resulting process model with all possible activities and 85% of the possible paths with the corresponding
deviations, is shown below:

Part 2. Conformance Checking

Since providing process models is not enough because the process owner is diffident with any result which he/she is simply
provided with and he/she wants evidence of the goodness of these models. I am going to perform alignment-based
conformance checking. For this aim, I am executing the plug-in Replay a log on Pietri Net for conformance analysis with
each one of the filtered logs and models generated in Part 1. The moves of all alignments are projected onto the Pietri-nets
models below:

I. The Model of the Applications Lifecycle

One can observe that this event log contains almost no deviations with respect to the model discovered in Part 1.I.
A_CANCELED and A_FINALIZED are the transitions where a percentage of moves in the model compared with
synchronous moves is larger than zero. Observing the information inside the transition, A_CANCELED occurred 2807 as
synchronous moves (when it was supposed to) and 399 times as move in the model (it did not occur). In the same way,
A_FINALIZED has a 5015/98 ratio. From the general statistics of the alignments shown in the table below, it is possible to
verify an almost perfect Move-Log fitness, Move-Model fitness and Trace fitness. In addition, without performing a Structural
Analysis, one can argue that this models simplicity along with its fitness, makes it a very good model of the process.
2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.
II. The Model of the Offers Lifecycle

This event log contains no deviations with respect to the model discovered in Part 1.II. Nevertheless it shows (with a yellow
filled place) a move in the log when transition O_SELECTED is enabled. However, in The project alignments on the log
perspective, where the general statistics are shown, one can verify its fitness is almost equal to 1, close to the perfect score.

III. Combined model with Declined Outcome

For this event log there are no deviation shown with respect to the model discovered in Part 1.III-1. Although it show moves
in the log (with a yellow filled place) when the final transition, O_DECLINED, is enabled. Then, The project alignments on
the log perspective shows the general statistics where its fitness is almost equal to 1, close to the perfect score.

IV. Combined model with Cancelled Outcome

For this log the conformance checker shows just one deviation with respect of the model transitions. That deviation is shown
in the transition A_PREACCEPTED. Apart from that, there are several moves in the log shown when transitions
O_CREATED, O_CANCELED and A_CANCELED are enabled. The general statistics show values of 98% in all three
fitness indicators.
2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.

V. Combined model with Approved Outcome

The model above is the more complex among the analyzed models (note that the trace length is 14.94), this was expected
due to the fact that we are analyzing the approved outcome. Although, the conformance checker shows no deviations
among transitions except for the O_ACCEPTED transition. This transition occurred 661 as synchronous moves (when it
was supposed to) and 2 times as move in the model (it did not occur). Apart from that, the projection shows some moves
in the log when transitions O_SELECTED and O_CANCELED are enabled. The fitness measures are nearly 1 showing
that the model is able to reproduce almost all execution sequences at the case level.

Part 3. Answering Customers questions

The process owner is interested in obtaining answers to the following questions:

1. Are some decisions in any of the models driven by the applications amount?

To answer this question I am making use of the Guard Discovery plug-in Discovery of the Process Data-flow
(Decision-Tree Miner) to find, for each one the models generated in Part 1, some correlations among decision
points and the variable AMOUNT_REQ.

I. Combined model with Approved Outcome

In the case of applications with approved outcomes, even by setting lower the Minimal fitness to consider a trace
parameter, the guard discovery algorithm does not find any correlation between any decision point and the
2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.
Requested Amount variable. For the other two combined models, the Guard Discovery plug-in generates similar
results.

II. Combined model with Cancelled Outcome

III. Combined model with Declined Outcome

IV. The Model of the Applications Lifecycle

For the Applications Lifecycle complete model, the Guard Discovery plug-in outputs some correlations.

The first table shows the results of the analysis taking into consideration all of the variables. We dismiss these
results after verifying that these guards always involve the resource 112 (which in the next answer we conclude is
an automatic resource), hence the conjunction of this variables is not interesting.

The second table shows the results for the analysis considering only the Amount Requested variable. These results,
with a relatively high F-score indicator, show the following correlations:
2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.
The applications with Amount Requested lower than 5350, are more likely to not be pre-accepted.
The applications with Amount Requested between 3750 and 50000 are more likely to be accepted.

However, one can interpret that these results are not very surprising considering that one of the process rules is
that the applicants have to ask for at least 5.000 and at most 50.000. Therefore, it is not possible for us to
conclude that there is a correlation between the value of the Requested Amount for an application with any of the
transition points (process decisions) in the application lifecycle.

2. Are there clear indications that the same employees are always involved in cases that are declined,
approved or cancelled?

As a previous step to do this analysis, it is important to not take into consideration those resources who are not
employees. For example, the graphic below correspond to the use of resources over time for one of the filtered
logs; and it shows that resource 112 has a continuous line of work, therefore we can assume it is not an employee.

Then, to answer this question one can analyze the End Event section of Resource Classifier in the Summary of the
filtered logs used to generate each one of the combined models in Part 1.III with declined, approved and cancelled
outcomes. The sections below show the top 10 most used resources for the end events of every one studied cases:

I. Combined model with Cancelled Outcome

From the table above, one cannot conclude that there are employee resources influencing cancelled outcomes.
The distributions among resources are fairly similar except from resource 112, which we assume is a machine.

II. Combined model with Declined Outcome

The table above shows similar result for this case of declined outcomes. Except from the automatic resource (112),
the distribution of occurrences in the declining of applications is fairly distributed among 56 resources.
2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.

III. Combined model with Approved Outcome

In this case, the resource 10138 highlights in comparison with the rest of the resources, accounting for 31,8% of
the total occurrences. In the second image one can verify that the same resource is also involved in a low
percentage (0,15%) of application cancelations, from which we assume that is not a resource dedicated to only
approval tasks. This can be an indicator that this employee is more involved than others in approved cases.

3. Comparison of the throughput times between the non-approved applications and the approved
applications.

To perform this analysis, I executed the Time between transition analysis visualization option of the Replay a log
on Pietri Net for conformance analysis plug-in outcome, for each one of the three combined models:

I. Combined model with Declined Outcome

The graph above shows that the time between the moment in which an application is submitted and when it is
cancelled is in average 23.622,49 minutes. That corresponds to the cell A_SUBMITTED-A_CANCELLED in the
matrix.

II. Combined model with Cancelled Outcome


2IMI35 ASSIGNMENT 2 25/10/2016
Student ID: S167017 Name: Garcia Torres, D.M.

The graph above shows that the time between the moment in which an application is submitted and when it is
cancelled is in average 24.383,12 minutes. That corresponds to the cell A_SUBMITTED-A_CANCELLED in the
matrix.

III. Combined model with Approved Outcome

The graph above shows that the time between the moment in which an application is submitted and when it is
approved is in average 24.450,03 minutes. That corresponds to the cell A_SUBMITTED-A_APPROVED in the
matrix.

In summary, the throughput times between non-approved applications and approved applications is slightly similar.
However Approved applications lasts, in average, 66,91 minutes less than Cancelled applications and 827,54
minutes less than Declined applications. More details about minimum and maximum times are shown in the table
below.

Conclusion

To conclude this assignment I can say I am convinced that tools like ProM in conjunction with Data Mining knowledge and
management skills can provide a great amount of value to improve business processes in organizations.