Jeevan Mtech Thesis Presentation PDF

VISHLESHAN: Performance Comparison and Programming of
Process Mining Algorithms in Graph-Oriented and Relational

Database Query Languages .
Jeevan Joishi [jeevan1336@iiitd.ac.in]
MTech Research Associate, Software Analytics Research Lab (SARL)
www.software-analytics.in
1/109
MTech Thesis Evaluation Committee Members

Thesis Adviser
Prof. Ashish Sureka
Adjunct Faculty at IIIT-Delhi and currently Visiting Researcher at Siemens
Corporate Research and Technology
Faculty In-charge, Software Analytics Research Lab (SARL)
External Examiner
Dr. Radha Krishna Pisipati
Principal Research Scientist at Infosys Technologies Limited.
Internal Examiner
Prof. Sandip Aine
Faculty Member at IIIT-Delhi
2/109
Vishleshan
Outline
1.
Research Motivation and Aim
2.
Related Work and Novel Research Contributions
3.
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
4.
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph

Oriented
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
3/109
Vishleshan
Presentation Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
4/109

Implementation of Alpha Algorithm in SQL, RDBMS
Implementation of Alpha Algorithm in CQL, Column Oriented
Conclusion
Limitations
References
Vishleshan
Introduction to NoSQL
Why NoSQL?
Global population accessing internet has increased
tremendously.
Most applications are hosted on the cloud and need to
support users 24 hours a day, 365 days a year.
Fig 1: Scale of internet usage.
5/109
Figure taken from [17]
Vishleshan
Why NoSQL?
Data is captured in huge volumes and consists of both
structured and unstructured data.
Amount of data is growing rapidly and nature of data is
growing as well.
Fig 2: Growth of data.
6/109
Vishleshan
Why NoSQL?
What is wrong with relational databases?
Nothing!
Relational Databases employ one size fits all philosophy
for storage.
Relational Databases are used when strong consistency is a
must.
Relational Databases can create problem when its time to
scale.
7/109
Vishleshan
Why NoSQL?
Explosion of social media sites like Facebook, Twitter with large data
needs.
They had to capture and deal with very large volumes of data in a way
which was difficult to deal with traditional RDBMS.
Traditional databases are designed to scale up. We required a database
that can scale out.
When relational applications become successful, usage goes up. Joins
are inherent in RDBMS and become very slow!
Application developers find it difficult to get the dynamic scalability
they need while maintaining the performance users demand.
8/109
Vishleshan
Why NoSQL?
We require a technology that scales out rather than scaling up!
Scale Up- Add more processor, memory.
Scale Out- Add more servers.
9/109
Fig 3: Scale-up vs. Scale-out.

Vishleshan
Introduction To NoSQL
NoSQL Database.
Hence, NoSQL databases were introduced:
Not Only SQL

Non-relational data stores.
Do not require a fixed table schema.
Do not strictly follow on ACID properties of database,
instead focus on CAP(Consistency, Availability, Partition
Tolerance).
Column stores, Graph databases, Document stores.
10/109
Vishleshan
RDBMS vs. NoSQL

Scale up vs. Scale out
Normalization vs. De-normalization
ACID vs. CAP
Schema vs. Schema-less

Structured Data vs. Unstructured Data.
11/109
Vishleshan
Row Oriented vs. Graph Oriented Database
Row Oriented vs. Graph Oriented
Recor
d No
Name
Address
City
State
01
Jeevan Joishi
Uniworld Apartment
Bangalore
Karnataka
02
Kunal Gupta
15th Cross Road
Kanpur
Uttar
Pradesh
03
Priyanka Verma
Sector-7
Jind
Haryana
04
Nidhi Agarwal
JJ colony
Bhiwani
Haryana
Table 1: A RDBMS table.
12/109
Fig 4: A Graph model.
Vishleshan

In row oriented, to read specific attributes, whole record
needs to be read.
Joins in relational databases are compute-intensive tasks.
However, graph databases can read individual values based
on nodes, relationships or properties.
Graph databases avoid joins by traversing relationship(s)
using index-free adjacency.
13/109
Vishleshan
Fig. 5: Relationships in Relational databases.
14/109
Vishleshan
Fig. 5: Relationships in Relational databases.
15/109
Vishleshan
Fig. 6: Relationships in Graph databases.
16/109
Vishleshan

Non-native vs. Native Graph Processing
Fig 7: Non-Native Graph Processing using Global lookup index
17/109
Fig 8: Native Graph Processing using index-free adjacency
Vishleshan
Process Mining
Process Mining
Process Mining is analysing a process using event log data.
One of the key aspects is to study the social structure of the
organization using event logs.
18/109
Fig 9: Types of Process Mining Techniques
Vishleshan
Process Mining
Process Mining
Process Mining focuses on the analysis of process using
the data present in event logs.
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.
Each event has an activity that is being performed.
An event has an actor that handles the event.

Additionally, each such event may include a unique identifier.
19/109
Vishleshan
Process Mining
Process Mining
Fig. 10: An example Event Log.
20/109
Vishleshan
Process Mining
Process Mining
Each event is associated with Case Identifiers

(CaseID).
21/109

Additionally, each such event may include a unique
identifier.
Vishleshan
Process Mining
Process Mining
22/109
Vishleshan
Process Mining
Process Mining

identifier.
23/109
Vishleshan
Process Mining
Process Mining
24/109
Vishleshan
Process Mining
Process Mining

identifier.
25/109
Vishleshan
Process Mining
Process Mining
26/109
Vishleshan
Process Mining
Process Mining


identifier.
27/109
Vishleshan
Process Mining
Process Mining
28/109
Vishleshan
Process Mining
Process Mining


identifier.
29/109
Vishleshan
Process Mining
Process Mining
30/109
Vishleshan
Process Mining
Process Mining
3 types of process mining techniques:
1. Process Discovery
2. Process Conformance
3. Process Enhancement
3 types of process mining perspectives:

1. Control Flow Perspective
2. Organizational Perspective
3. Case Perspective.
31/109
Vishleshan
Process Mining
Similar Task Algorithm

Similar Task algorithm focuses on identifying actors
performing similar activities in the organizational
perspective.
It focuses on activities the actors perform irrespective of
cases.
It is based on the notion that people doing similar things
have a stronger relation than people doing different things.
32/109
Vishleshan
Process Mining
33/109
Case
Identifier
Activity
Identifier
Actor
Nidhi
Nidhi
Nidhi
Kunal
Kunal
Priyanka
Priyanka
Pooja
Pooja
Nidhi
Astha
Kunal
Priyanka
Pooja
Astha
Astha
Table 2: Sample Event Log
Table 3: Actor-Activity Matrix
Vishleshan
Process Mining
Similar - Task Algorithm

Given two vectors of attributes, A and B, the Cosine-Similarity if
given by
34/109
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
---
0.32
0.00
0.63
0.00
Kunal
0.32
---
0.00
0.00
0.70
Priyanka
0.00
0.00
---
0.70
0.00
Pooja
0.63
0.00
0.70
---
0.00
Astha
0.00
0.70
0.00
0.00
---
Table 4: Cosine Similarity Values
Figure taken from [21].
Vishleshan
Similar - Task Algorithm at a glance!
Similar Task Algorithm at a glance!
35/109
Vishleshan
Process Mining
Sub Contract Algorithm

Sub Contract algorithm focuses on how work moves among
performers.
The main idea is to count the number of times individual j
performs an activity in between two activities performed by
individual i.
The relation between individuals are case dependent.
36/109
Vishleshan
Process Mining

Case
Identifier
Activity
Identifier
Actor
Case
Identifier
Activity
Identifier
Actor
Nidhi
Nidhi
Nidhi
Priyanka
Kunal
Priyanka
Astha
Pooja
Nidhi
Nidhi
Kunal
Kunal
Pooja
Priyanka
Astha
Pooja
Pooja
Astha
Kunal
Astha
Priyanka
37/109
Zoom Shape
1 Nidhi
C
Table 6: Organized Event Log
38/109
39/109
Vishleshan
Process Mining

Case
Identifier
Activity
Identifier
Actor
Case
Identifier
Activity
Identifier
Actor
Nidhi
Nidhi
Nidhi
Priyanka
Kunal
Nidhi
Priyanka
Astha
Pooja
Nidhi
Nidhi
Kunal
Kunal
Pooja
Priyanka
Astha
Pooja
Pooja
Astha
Kunal
Astha
Priyanka
40/109
Table 6: Organized Event Log
Vishleshan
Process Mining
Sub - Contract Algorithm

normal = 4.0
Nidhi
Kunal
Priyanka
Zoom Shape 1
Pooja
Astha
Nidhi
Kunal
Priyanka
Pooja
Astha
0.00
1.00
0.00
0.00
Nidhi
0.00
0.00
0.25
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Nidhi
0.00
Kunal
Table 7: Sub Contraction Values

before Normalization
41/109

after Normalization
42/109
43/109
Vishleshan
Process Mining

normal = 4.0
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
0.25
0.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
1.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
Pooja
0.00
0.00
Astha
0.00
0.00

44/109

after Normalization
45/109
46/109
Vishleshan
Process Mining

normal = 4.0
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
0.25
0.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
1.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
Pooja
0.00
0.00
Astha
0.00
0.00

47/109

after Normalization
Vishleshan
Sub - Contract Algorithm at a glance!
Sub Contract Algorithm at a glance I
48/109
Vishleshan
Sub - Contract Algorithm at a glance!
Sub Contract Algorithm at a glance II
49/109
Vishleshan

Query languages provide the most standard way to
interact with the database.
We, try to implement process mining algorithm using
database query languages to the extent possible so that
our application is tightly coupled to the database.
Our work lies at the intersection of Process Mining and
NoSQL databases.
50/109
Vishleshan
Research Aim .
Research Aim
To investigate the intersection of Process Mining and Graph Database(s) for

detecting social, hierarchical structures.
To understand application needs that can be modelled into this new domain.
To implement Similar-Task algorithm and Sub-Contract algorithm in row-oriented

database, MySQL.
To implement Similar-Task algorithm and Sub-Contract algorithm in graph

oriented database, Neo4j.
To compare performance of Similar-Task algorithm and Sub-Contract Algorithm in

MySQL and Neo4j.
51/109
Vishleshan
1.
2.
3.
Implementation of Similar-Task and Sub-Contract Algorithm in SQL,

RDBMS
4.
Implementation of Similar-Task and Sub-Contract Algorithm in

CYPHER, Graph Oriented
5.
6.
7.
Conclusion
8.
Limitations
9.
References
52/109
Vishleshan
Implementation of Mining Algorithms in Relational Databases.
Implementation of Mining Algorithms in Relational Databases

Ordonez et al. [5]
Implement k-means clustering algorithm in SQL.
Cluster large datasets in RDBMS.
Define suitable tables, index them and write suitable queries for
clustering purposes.
Ordonez et al. [6]

Extend own work in [5].
Efficient implementation of EM algorithm to perform clustering in
very large datasets.
53/109
Vishleshan

Berzal et al. [7]
Implemented Tree Based Association Rule Mining to discover
interesting patterns in relational databases.
Sattler et al. [8]

Applied data mining techniques on a decision tree and classifier.
Tight coupling of data mining and database systems.
54/109
Vishleshan
Implementation of Mining Algorithms in Graph Databases

Wang et al. [9]
Studied structural pattern mining for large disk based graph
databases.
They presented a novel ADI index structure and efficient algorithms
for mining frequent pattern.
Wang et al. [10]

Presented techniques to obtain scalable mining in graph databases.
55/109
Vishleshan
Implementation of Mining Algorithms in Graph Databases.

Huan et al. [11]
Presented novel technique to mine maximal frequent sub-graph in
graph databases.
Ozaki et al. [12]

Came up with hyper-clique pattern in graph databases.
Used hyper-clique pattern to detect highly correlated sub-graphs.
56/109
Vishleshan
Performance Comparison of Mining Algorithms in Relational and Graph Databases.
Performance Comparison of Mining Algorithms in Relational and Graph

Databases.
Vicknair et al. [13]
Performance comparison of Relational and Graph databases for
data provenance systems.
McColl et al. [14]

Evaluated performance of series of open-source graph databases.
Used various graph algorithms for a graph setup consisting of 256
million nodes.
57/109
Vishleshan
Performance Comparison of Mining Algorithms in Relational and Graph Databases.
Performance Comparison of Mining Algorithms in Relational and Graph

Databases.
Ciglan et al. [15]
Benchmarked graph databases over graph traversal algorithms.
Macko et al. [16]

Presented a performance introspection framework for Graph
database, PIG.
PIG provided tools and mechanisms to understand performance of
graph database.
58/109
Vishleshan
Novel Research Contributions.
Novel Research Contributions

While there has been work done in implementing data mining algorithms
in relational and graph databases, we are,
First to implement organizational mining algorithms (Similar-Task and
Sub-Contract) in row oriented database MySQL using SQL.
First to implement organizational mining algorithms (Similar-Task and
Sub-Contract) in graph oriented database Neo4j using CYPHER.
Performance Benchmarking of organizational mining algorithms
(Similar-Task and Sub-Contract) on MySQL and Neo4j.
59/109
Vishleshan
1.
2.
3.
Implementation of Similar-Task and Sub-Contract Algorithms in SQL,

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations
9.
References
60/109
Vishleshan
Steps
Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad
tasks
Declare and iterate cursor to select distinct tasks.

Create a table to store result.
Fetch actors vector and calculate Cosine Similarity.
Write results to the result table.
61/109
Vishleshan
Define and iterate cursor

Declare cursor to select distinct tasks from table
Open cursor. Loop through the results returned by the cursor.
62/109
Vishleshan
Declare table to store results

Dynamically create table with the specified table-name.
Prepare SQL statements from the query and execute it.
63/109
Vishleshan
Fetch actors vector and calculate Cosine-Similarity I.

Prepare query to insert into table
Define variables to store values for cosine-similarity calculation.
64/109
Vishleshan
Fetch actors vector and calculate Cosine-Similarity II.

Inside the cursor, collect distinct tasks from the tables for the required calculation.
65/109
Vishleshan
Fetch actors vector and calculate Cosine-Similarity III.

Append parts of cosine similarity calculation to the SQL query.
66/109
Vishleshan
Update Final Results I.

Declare a cursor to get all distinct teams.
Iterate through the cursor to get distinct teams
67/109
Vishleshan
Update Final Results II.

Form a query by for creating table and taking distinct teams as columns.
Inside the cursor loop, append distinct teams as columns of the table.
68/109
Vishleshan
Update Final Results III.

Form a query for inserting values into the table (resultant table)
Inside the cursor loop, assign similarity values at the respective column (match teams).
69/109
Vishleshan
Sub - Contract Algorithm.
Steps
Sub-Contract Algorithm implementation can be studied under four (4) broad
categories:
Create table to store results.
Find distinct case identifiers.
Update normal and find sub-contraction within each case.
Normalize the result.
70/109
Vishleshan
Create table to store results I

Declare cursor to select distinct actors.
Iterate through the cursor to collect the distinct actors.
71/109
Vishleshan
Create table to store results II

Form a query to create a table.
Inside the cursor, append each distinct actor as part of the query.
72/109
Vishleshan
Find distinct case identifiers

Declare cursor to select distinct case identifiers with count >= 3
Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase.
73/109
Vishleshan
Update normal and find sub-contraction I.

Update normal.
74/109
Vishleshan
Update normal and find sub-contraction II.

Declare a cursor to find sub-contracting actors.
75/109
Vishleshan
Update normal and find sub-contraction III.

Iterate through the cursor to find IDs of actor
76/109
Vishleshan
Update normal and find sub-contraction IV.

Declare cursor to find sub-contracting actors.
Iterate through the cursor to find IDs of sub-contracting actors.
77/109
Vishleshan
Update normal and find sub-contraction V.

For any pair of sub-contracting actor, insert or update sub-contract value between them.
78/109
Vishleshan
Normalize the result.

Declare cursor to select distinct actors that formed columns of the result table
For each column, form an update query and normalize it by normal
79/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented
1.
2.
3.

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations
9.
References
80/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.
Steps
Implementation of Similar Task algorithm in CYPHER consists mainly of two (2)
broad functions.
Load data with Actor and activity nodes being unique.
Calculate Cosine-Similarity between actors.
81/109
Vishleshan
Load actor and activity node uniquely.

Load data directly from the data file. Make unique nodes for actor and activity.
82/109
Vishleshan
Calculate Cosine - Similarity.

Match common activities between actors and calculate similarity.
83/109
Vishleshan
Sub Contract Algorithm.
Steps
Implementation of Sub Contract algorithm in CYPHER consists mainly of four (4)
broad functions.
Identify sub contracting actors within each case.
Collect unique names and make new nodes for each of them.
Set sub contraction strength between unique actor nodes.
Calculate normal and normalize the sub contraction value.
84/109
Vishleshan
Identify sub contracting actors.

Identify sub-contracting actors and connect then via [:RELATED_TO] relationship.
85/109
Vishleshan
Collect unique names and create unique actor nodes.

Collect unique actor names
Make new nodes, UNIQUEACTOR for each distinct actor names found.
86/109
Vishleshan
Set sub contraction strength between unique actors.

For all sub-contracting actor, determine strength of sub-contraction between the actors.
87/109
Vishleshan
Calculate normal and normalize the result.

Calculate normal.
Normalize the sub-contraction strength between actors.
88/109
Vishleshan
1.
2.
3.

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations
9.
References
89/109
Vishleshan
Experimental Dataset.
We use Business Process Intelligence 2014 (BPI 2014)
dataset to conduct our experiments.
The log contains events from an incident and problem
management system of Rabobank Group ICT.
Contains data about managing requests from Rabobank
Group ICT.
Contains total 466737 records.
90/109
Vishleshan
Dataset Details
Fig. 11: Sample Event Log from MySQL.

91/109
Vishleshan
1.
2.
3.

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations
9.
References
92/109
Vishleshan
Load Time
Dataset size
Load Time (msec)
MySQL
Neo4j
65,000
2467
3413
1,01,000
2875
3362
2,19,500
5966
4354
3,00,000
5850
5877
4,66,737
7819
6875
Table 9: Data Load Time
93/109
Fig 12: Load Time
Vishleshan
Execution Time I
Dataset
Size
Execution Time (msec)

Step -8
Step -9
MySQL
Neo4j
MySQL
Neo4j
65,000
225
9616
2467
2403
1,01,000
372
11700
2875
2925
2,19,500
713
14655
5966
3664
3,00,000
903
29520
5850
7380
4,66,737
1403
48891
7819
12223
Table 10: Execution Time of Step-8 & Step-9
94/109
Vishleshan
Execution Time II
Fig. 13: Execution Time of Step-8 & Step-9
95/109
Vishleshan
Disk Usage in MySQL I

Tables
Dataset Size
65000
101000
219500
300000
466737
Dataset
3686400
5783552
11026432
15220736
21544960
OTMatrix
65536
65536
65536
81920
81920
InitSim
1589248
1589248
1589248
3686400
3686400
FinalSim
229376
262144
278528
491520
1589248
Table 11: Disk Space Usage in MySQL.
96/109
Vishleshan
Disk Usage in MySQL II
Fig 14: Disk Space Usage in MySQL.
97/109
Vishleshan
Disk Usage in Neo4j I

Graph
Elements
Dataset Size
65000
101000
219500
300000
466737
Nodes
2820
2910
3075
3990
4215
Relationships
770040
414315
479663
8568809
983227
Properties
1033856
563873
651203
1155011
1323439
Table 12: Disk Space Usage in Neo4j.
98/109
Vishleshan
Disk Usage in Neo4j II
Fig. 14: Disk Space Usage in Neo4j.
99/109
Vishleshan
Load Time
Dataset size
Load Time (msec)
MySQL
Neo4j
65,000
6575
9567
1,01,000
8390
10476
2,19,500
14279
14873
3,00,000
26437
25435
4,66,737
43712
38234
Table 13: Load Time
100/109
Fig 15: Load Time
Vishleshan
Execution Time in MySQL I

Dataset Size

Update
Normal
Sub-Contract
Detection
Update
Result
Normalize
result
65,000
32
11712
8296
16
1,01,000
32
11782
8138
16
2,19,500
35
11713
7940
17
3,00,000
70
11,736
8094
17
4,66,737
73
11747
7754
20
Table 14: Execution Time for 4 main steps in MySQL.
101/109
Vishleshan
Execution Time in MySQL II
Fig 16: Execution Time for 4 main steps in MySQL.
102/109
Vishleshan
Execution Time in Neo4j I

Dataset Size

Update
Normal
Sub-Contract
Detection
Update
Result
Normalize
result
65,000
118
1542
2077
1,01,000
140
1707
2773
2,19,500
202
2534
2369
3,00,000
336
3442
5261
4,66,737
560
4149
5334
Table 15: Execution Time for 4 main steps in Neo4j
103/109
Vishleshan
Execution Time in Neo4j II
Fig. 17: Execution Time for 4 main steps in Neo4j.
104/109
Vishleshan
Disk Space Usage in MySQL I

Tables
Dataset Size
65000
101000
219500
300000
466737
Dataset
4734976
6832128
13123584
18366464
27836416
OrganisedData
4734976
6832128
13123584
18366464
27836416
ResultMatrix
1589248
1589248
1589248
1589248
1589248
Table 15: Disk Space Usage in MySQL
105/109
Vishleshan
Disk Space Usage in MySQL II
Fig 17: Disk Space Usage in MySQL
106/109
Vishleshan
Disk Space Usage in Neo4j I

Graph
Elements
Dataset Size
65000
101000
219500
300000
466737
982212
1523732
3360798
4598454
7190330
Relationships 153477921
183955761
285778449
375437997
490033038
Properties
461537287
719874720
942665404
1238579332
Nodes
384189475
Table 16: Disk Space Usage for graph elements in Neo4j.
107/109
Vishleshan
Disk Space Usage in Neo4j II
Fig. 18: Disk Space Usage for graph elements in Neo4j.
108/109
Vishleshan
Conclusion
1.
2.
3.

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations
9.
References
109/109
Vishleshan
Conclusion
.
Conclusion
Neo4j performs better when it comes to loading data.
Read operations in MySQL are comparatively faster for a

single node setup.
Neo4j gives much improved performance whenever
relationships are of prime importance.
Writes performance varied greatly for both cases. For
smaller dataset, MySQL performs better whereas for larger
dataset, Neo4j gives improved performance.
110/109
Vishleshan
Limitations
1.
2.
3.

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations and Future work
9.
References
111/109
Vishleshan
Limitations and Future Work
.
Limitations
Limitations
Different sizes of single dataset was used.
Single node setup of databases were used.
Metrics used for organizational mining were only two in
number.
112/109
Vishleshan
Limitations and Future Work.
Future Work
Future Work
To apply the algorithm over larger data sets.
Create a multi-node Neo4j setup and implement the
algorithms on it.
Implement and study impact of process enhancement and
recommendation systems.
Experiment with more relational and graph oriented
databases.
113/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented
1.
2.
3.

RDBMS
4.

5.
6.
7.
Conclusion
8.
Limitations
9.
References
114/109
Vishleshan
References
References I
WIL VAN DER AALST.
Process Mining: Overview and Opportunities.
ACM, 2012. vi, 2, 11
P Neubauer.
Graph databases, NOSQL and Neo4j?
www.infoq.com.
I Robinson, J Webber, E Eifrem.
Graph Databases
www.books.google.com.
Minseok Song, WIL M. P. Van Der Aalst.
Towards comprehensive support for organizational mining.
Elsevier, 2008.
115/109
Vishleshan
References
References II
Carlos Ordonez.
Programming the K-means clustering algorithm in SQL
C. Ordonez and P. Cereghini.
SQLEM: fast clustering in SQL using the EM algorithm.
International Conference on Management of Data
Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero.
TBRAR: An ecient method for association rule mining in relational databases.
Elsevier, 2001.
K-U.Sattler and O.Dunemann.
SQL Database Primitives for Decision Tree Classiers.
Conference on Information and Knowledge Management, 2001.
116/109
Vishleshan
References
References III
W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan.
Graphminer: a structural pattern mining system for large disk based graph
databases and its applications.
ACM, 2005.
C Wang, W Wang, Y Zhu, B Shi, J Pei.
Scalable Mining of large disk based graph databases.
ACM, 2004.
J Huan, W Wang, J Prins.

SPIN: mining maximal frequent subgraphs from graph databases.
ACM, 2004.
T Ozaki, T Okhwaha.
Mining correlated subgraphs in graph databases.
Advancement in Knowledge Discovery and Data Mining, 2008.
117/109
Vishleshan
References
References IV
C Vicknair, M Macais, Z Zhao, X Nan, Y Chen.
A comparison of graph databases and a relational database: a data provenance
perspective
ACM, 2010.
RC McColl, R Ediger, J Poovey, D Campbell.
A performance evaluation of open-source graph databases.
ACM, 2014.
M Ciglan, A Averbuch, L Hluchy

Benchmarking graph traversal operations over graph databases.
IEEE, 2012.
P Macko, D Margo, M Seltzer.
Performance introspection of graph databases
ACM, 2013.
118/109
Vishleshan
References
References V
Why NOSQL?
Couchbase.
Scale-out vs. Scale-up.
www.natishalom.typepad.com.
Introduction to Graph Databases and Neo4j.

www.neo4j.com
From Relational to Neo4j.
www.neo4j.com
Cosine- Similarity
www.Wikipedia.com
119/109

Jeevan Mtech Thesis Presentation PDF

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Jeevan Mtech Thesis Presentation PDF

Загружено:

Авторское право:

Доступные форматы

VISHLESHAN: Performance Comparison and Programming of

Process Mining Algorithms in Graph-Oriented and Relational

MTech Thesis Evaluation Committee Members

Research Motivation and Aim

Related Work and Novel Research Contributions

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph

Research Motivation and Aim

Fig 1: Scale of internet usage.

Fig 2: Growth of data.

Fig 3: Scale-up vs. Scale-out.

Not Only SQL

RDBMS vs. NoSQL

Schema vs. Schema-less

Row Oriented vs. Graph Oriented

15th Cross Road

Table 1: A RDBMS table.

Fig 4: A Graph model.

Row Oriented vs. Graph Oriented

Row Oriented vs. Graph Oriented

Fig. 5: Relationships in Relational databases.

Row Oriented vs. Graph Oriented

Fig. 5: Relationships in Relational databases.

Row Oriented vs. Graph Oriented

Fig. 6: Relationships in Graph databases.

Row Oriented vs. Graph Oriented

Fig 7: Non-Native Graph Processing using Global lookup index

Fig 8: Native Graph Processing using index-free adjacency

Fig 9: Types of Process Mining Techniques

An event has an actor that handles the event.

Fig. 10: An example Event Log.

Each event is associated with Case Identifiers

Each event has a timestamp.

Fig. 10: An example Event Log.

Each event has a timestamp.

Fig. 10: An example Event Log.

Each event has an activity that is being performed.

Fig. 10: An example Event Log.

Each event in an event log record details in an activity.

An event has an actor that handles the event.

Fig. 10: An example Event Log.

Each event in an event log record details in an activity.

Additionally, each such event may include a unique

Fig. 10: An example Event Log.

3 types of process mining perspectives:

Similar Task Algorithm

Similar Task Algorithm

Table 2: Sample Event Log

Table 3: Actor-Activity Matrix

Similar - Task Algorithm

Table 4: Cosine Similarity Values

Figure taken from [21].

Similar Task Algorithm at a glance!

Sub Contract Algorithm

Sub Contract Algorithm

Table 5: Sample Event Log

Table 6: Organized Event Log

Sub Contract Algorithm

Table 5: Sample Event Log

Table 6: Organized Event Log

Sub - Contract Algorithm

Table 7: Sub Contraction Values