Process Mining assignment #2

You are on page 1of 8

Introduction

The goals I am trying to achieve with this assignment are mainly to gain new insights of what can be done with Data Mining,

Machine Learning and Process Mining tools, how can I use them for my own projects and how can I apply the concepts

learned in this Process Mining course.

We are interested to discover how instances of this process are actually being executed. However, the process clearly

shows much variability and the activities that are actually executed depend on the final outcome. Thus, it is not possible to

discover a single process model. Therefore, I am going to discover different process models: the Applications Lifecycle,

the Offers Lifecycle and the Combined Lifecycle.

For each of them, I start by applying the Filter Log on Simple Heuristics plugin in order to remove the incomplete activities;

these are the activities with a status different from COMPLETE.

To begin with the analysis, I apply once again the Filter Log on Simple Heuristics plugin to select the events with name

starting with A. Then, I apply the Mine with Inductive Visual Miner plugin. The resulting process model with 80% of the

possible paths and the corresponding deviations is shown below:

For this model, I apply once again the Filter Log on Simple Heuristics plugin to select the events with name starting with O.

Then, I apply the Mine with Inductive Visual Miner plugin. The resulting process model with 80% of the possible paths and

the corresponding deviations is shown below:

Due to the high variability, I will discover one different model for each possible outcome, namely whether the application is

finally Declined, Cancelled or Approved. In these cases, the processing steps to generate the models starts with executing

the Filter Log on Simple Heuristics to select just the events with names starting with A or with O.

To get the combined model with declined application outcome, in the End events screen of the Filter Log on Simple

Heuristics plugin, I select just the activity A_DECLINED. Finally I apply the Mine with Inductive Visual Miner plugin.

The resulting process model with all possible activities and 85% of the possible paths with the deviations is:

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

2. Model with Cancelled Outcome

For the case of the combined model with declined application outcome, in the End events screen of the Filter Log

on Simple Heuristics plugin, I select just the activity A_CANCELED. Finally I apply the Mine with Inductive Visual

Miner plugin and the resulting process model with all possible activities and 68% of the possible paths with the

corresponding deviations, is shown below:

For this case, in the End events screen of the Filter Log on Simple Heuristics plugin, I select the activities

A_REGISTERED, A_PPROVED and A_ACTIVATED. Finally I apply the Mine with Inductive Visual Miner plugin

and the resulting process model with all possible activities and 85% of the possible paths with the corresponding

deviations, is shown below:

Since providing process models is not enough because the process owner is diffident with any result which he/she is simply

provided with and he/she wants evidence of the goodness of these models. I am going to perform alignment-based

conformance checking. For this aim, I am executing the plug-in Replay a log on Pietri Net for conformance analysis with

each one of the filtered logs and models generated in Part 1. The moves of all alignments are projected onto the Pietri-nets

models below:

One can observe that this event log contains almost no deviations with respect to the model discovered in Part 1.I.

A_CANCELED and A_FINALIZED are the transitions where a percentage of moves in the model compared with

synchronous moves is larger than zero. Observing the information inside the transition, A_CANCELED occurred 2807 as

synchronous moves (when it was supposed to) and 399 times as move in the model (it did not occur). In the same way,

A_FINALIZED has a 5015/98 ratio. From the general statistics of the alignments shown in the table below, it is possible to

verify an almost perfect Move-Log fitness, Move-Model fitness and Trace fitness. In addition, without performing a Structural

Analysis, one can argue that this models simplicity along with its fitness, makes it a very good model of the process.

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

II. The Model of the Offers Lifecycle

This event log contains no deviations with respect to the model discovered in Part 1.II. Nevertheless it shows (with a yellow

filled place) a move in the log when transition O_SELECTED is enabled. However, in The project alignments on the log

perspective, where the general statistics are shown, one can verify its fitness is almost equal to 1, close to the perfect score.

For this event log there are no deviation shown with respect to the model discovered in Part 1.III-1. Although it show moves

in the log (with a yellow filled place) when the final transition, O_DECLINED, is enabled. Then, The project alignments on

the log perspective shows the general statistics where its fitness is almost equal to 1, close to the perfect score.

For this log the conformance checker shows just one deviation with respect of the model transitions. That deviation is shown

in the transition A_PREACCEPTED. Apart from that, there are several moves in the log shown when transitions

O_CREATED, O_CANCELED and A_CANCELED are enabled. The general statistics show values of 98% in all three

fitness indicators.

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

The model above is the more complex among the analyzed models (note that the trace length is 14.94), this was expected

due to the fact that we are analyzing the approved outcome. Although, the conformance checker shows no deviations

among transitions except for the O_ACCEPTED transition. This transition occurred 661 as synchronous moves (when it

was supposed to) and 2 times as move in the model (it did not occur). Apart from that, the projection shows some moves

in the log when transitions O_SELECTED and O_CANCELED are enabled. The fitness measures are nearly 1 showing

that the model is able to reproduce almost all execution sequences at the case level.

1. Are some decisions in any of the models driven by the applications amount?

To answer this question I am making use of the Guard Discovery plug-in Discovery of the Process Data-flow

(Decision-Tree Miner) to find, for each one the models generated in Part 1, some correlations among decision

points and the variable AMOUNT_REQ.

In the case of applications with approved outcomes, even by setting lower the Minimal fitness to consider a trace

parameter, the guard discovery algorithm does not find any correlation between any decision point and the

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

Requested Amount variable. For the other two combined models, the Guard Discovery plug-in generates similar

results.

For the Applications Lifecycle complete model, the Guard Discovery plug-in outputs some correlations.

The first table shows the results of the analysis taking into consideration all of the variables. We dismiss these

results after verifying that these guards always involve the resource 112 (which in the next answer we conclude is

an automatic resource), hence the conjunction of this variables is not interesting.

The second table shows the results for the analysis considering only the Amount Requested variable. These results,

with a relatively high F-score indicator, show the following correlations:

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

The applications with Amount Requested lower than 5350, are more likely to not be pre-accepted.

The applications with Amount Requested between 3750 and 50000 are more likely to be accepted.

However, one can interpret that these results are not very surprising considering that one of the process rules is

that the applicants have to ask for at least 5.000 and at most 50.000. Therefore, it is not possible for us to

conclude that there is a correlation between the value of the Requested Amount for an application with any of the

transition points (process decisions) in the application lifecycle.

2. Are there clear indications that the same employees are always involved in cases that are declined,

approved or cancelled?

As a previous step to do this analysis, it is important to not take into consideration those resources who are not

employees. For example, the graphic below correspond to the use of resources over time for one of the filtered

logs; and it shows that resource 112 has a continuous line of work, therefore we can assume it is not an employee.

Then, to answer this question one can analyze the End Event section of Resource Classifier in the Summary of the

filtered logs used to generate each one of the combined models in Part 1.III with declined, approved and cancelled

outcomes. The sections below show the top 10 most used resources for the end events of every one studied cases:

From the table above, one cannot conclude that there are employee resources influencing cancelled outcomes.

The distributions among resources are fairly similar except from resource 112, which we assume is a machine.

The table above shows similar result for this case of declined outcomes. Except from the automatic resource (112),

the distribution of occurrences in the declining of applications is fairly distributed among 56 resources.

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

In this case, the resource 10138 highlights in comparison with the rest of the resources, accounting for 31,8% of

the total occurrences. In the second image one can verify that the same resource is also involved in a low

percentage (0,15%) of application cancelations, from which we assume that is not a resource dedicated to only

approval tasks. This can be an indicator that this employee is more involved than others in approved cases.

3. Comparison of the throughput times between the non-approved applications and the approved

applications.

To perform this analysis, I executed the Time between transition analysis visualization option of the Replay a log

on Pietri Net for conformance analysis plug-in outcome, for each one of the three combined models:

The graph above shows that the time between the moment in which an application is submitted and when it is

cancelled is in average 23.622,49 minutes. That corresponds to the cell A_SUBMITTED-A_CANCELLED in the

matrix.

2IMI35 ASSIGNMENT 2 25/10/2016

Student ID: S167017 Name: Garcia Torres, D.M.

The graph above shows that the time between the moment in which an application is submitted and when it is

cancelled is in average 24.383,12 minutes. That corresponds to the cell A_SUBMITTED-A_CANCELLED in the

matrix.

The graph above shows that the time between the moment in which an application is submitted and when it is

approved is in average 24.450,03 minutes. That corresponds to the cell A_SUBMITTED-A_APPROVED in the

matrix.

In summary, the throughput times between non-approved applications and approved applications is slightly similar.

However Approved applications lasts, in average, 66,91 minutes less than Cancelled applications and 827,54

minutes less than Declined applications. More details about minimum and maximum times are shown in the table

below.

Conclusion

To conclude this assignment I can say I am convinced that tools like ProM in conjunction with Data Mining knowledge and

management skills can provide a great amount of value to improve business processes in organizations.

