Вы находитесь на странице: 1из 119

VISHLESHAN: Performance Comparison and Programming of

Process Mining Algorithms in Graph-Oriented and Relational


Database Query Languages .
Jeevan Joishi [jeevan1336@iiitd.ac.in]
MTech Research Associate, Software Analytics Research Lab (SARL)

www.software-analytics.in

1/109

MTech Thesis Evaluation Committee Members


Thesis Adviser
Prof. Ashish Sureka
Adjunct Faculty at IIIT-Delhi and currently Visiting Researcher at Siemens
Corporate Research and Technology
Faculty In-charge, Software Analytics Research Lab (SARL)
External Examiner
Dr. Radha Krishna Pisipati
Principal Research Scientist at Infosys Technologies Limited.
Internal Examiner
Prof. Sandip Aine
Faculty Member at IIIT-Delhi

2/109

Vishleshan

Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph


Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

3/109

Vishleshan
Research Motivation and Aim

Presentation Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.

4/109

Research Motivation and Aim


Related Work and Novel Research Contributions
Implementation of Alpha Algorithm in SQL, RDBMS
Implementation of Alpha Algorithm in CQL, Column Oriented
Experimental Dataset
Performance Comparison
Conclusion
Limitations
References

Vishleshan
Research Motivation and Aim
Introduction to NoSQL

Why NoSQL?
Global population accessing internet has increased
tremendously.
Most applications are hosted on the cloud and need to
support users 24 hours a day, 365 days a year.

Fig 1: Scale of internet usage.

5/109
Figure taken from [17]

Vishleshan
Research Motivation and Aim
Introduction to NoSQL

Why NoSQL?
Data is captured in huge volumes and consists of both
structured and unstructured data.
Amount of data is growing rapidly and nature of data is
growing as well.

Fig 2: Growth of data.

6/109
Figure taken from [17]

Vishleshan
Research Motivation and Aim
Introduction to NoSQL

Why NoSQL?
What is wrong with relational databases?
Nothing!
Relational Databases employ one size fits all philosophy
for storage.
Relational Databases are used when strong consistency is a
must.
Relational Databases can create problem when its time to
scale.

7/109

Vishleshan
Research Motivation and Aim
Introduction to NoSQL

Why NoSQL?
Explosion of social media sites like Facebook, Twitter with large data
needs.
They had to capture and deal with very large volumes of data in a way
which was difficult to deal with traditional RDBMS.
Traditional databases are designed to scale up. We required a database
that can scale out.
When relational applications become successful, usage goes up. Joins
are inherent in RDBMS and become very slow!
Application developers find it difficult to get the dynamic scalability
they need while maintaining the performance users demand.

8/109

Vishleshan
Research Motivation and Aim
Introduction to NoSQL

Why NoSQL?
We require a technology that scales out rather than scaling up!
Scale Up- Add more processor, memory.
Scale Out- Add more servers.

9/109

Fig 3: Scale-up vs. Scale-out.


Figure taken from [18]

Vishleshan
Research Motivation and Aim
Introduction To NoSQL

NoSQL Database.
Hence, NoSQL databases were introduced:

Not Only SQL


Non-relational data stores.
Do not require a fixed table schema.
Do not strictly follow on ACID properties of database,
instead focus on CAP(Consistency, Availability, Partition
Tolerance).
Column stores, Graph databases, Document stores.

10/109

Vishleshan
Research Motivation and Aim
Introduction to NoSQL

RDBMS vs. NoSQL


Scale up vs. Scale out
Normalization vs. De-normalization
ACID vs. CAP

Schema vs. Schema-less


Structured Data vs. Unstructured Data.

11/109

Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database

Row Oriented vs. Graph Oriented

Recor
d No

Name

Address

City

State

01

Jeevan Joishi

Uniworld Apartment

Bangalore

Karnataka

02

Kunal Gupta

15th Cross Road

Kanpur

Uttar
Pradesh

03

Priyanka Verma

Sector-7

Jind

Haryana

04

Nidhi Agarwal

JJ colony

Bhiwani

Haryana

Table 1: A RDBMS table.

12/109
Figure taken from [19]

Fig 4: A Graph model.

Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database

Row Oriented vs. Graph Oriented


In row oriented, to read specific attributes, whole record
needs to be read.
Joins in relational databases are compute-intensive tasks.
However, graph databases can read individual values based
on nodes, relationships or properties.
Graph databases avoid joins by traversing relationship(s)
using index-free adjacency.

13/109

Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database

Row Oriented vs. Graph Oriented

Fig. 5: Relationships in Relational databases.

14/109
Figure taken from [20]

Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database

Row Oriented vs. Graph Oriented

Fig. 5: Relationships in Relational databases.

15/109
Figure taken from [20]

Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database

Row Oriented vs. Graph Oriented

Fig. 6: Relationships in Graph databases.

16/109
Figure taken from [20]

Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database

Row Oriented vs. Graph Oriented


Non-native vs. Native Graph Processing

Fig 7: Non-Native Graph Processing using Global lookup index

17/109

Fig 8: Native Graph Processing using index-free adjacency

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining
Process Mining is analysing a process using event log data.
One of the key aspects is to study the social structure of the
organization using event logs.

18/109

Fig 9: Types of Process Mining Techniques

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining
Process Mining focuses on the analysis of process using
the data present in event logs.
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.
Each event has an activity that is being performed.

An event has an actor that handles the event.


Additionally, each such event may include a unique identifier.

19/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Fig. 10: An example Event Log.

20/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining
Each event in an event log record details in an activity.

Each event is associated with Case Identifiers


(CaseID).

21/109

Each event has a timestamp.


Each event has an activity that is being performed.
An event has an actor that handles the event.
Additionally, each such event may include a unique
identifier.

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Fig. 10: An example Event Log.

22/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).

Each event has a timestamp.


Each event has an activity that is being performed.
An event has an actor that handles the event.
Additionally, each such event may include a unique
identifier.

23/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Fig. 10: An example Event Log.

24/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.

Each event has an activity that is being performed.


An event has an actor that handles the event.
Additionally, each such event may include a unique
identifier.

25/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Fig. 10: An example Event Log.

26/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Each event in an event log record details in an activity.


Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.
Each event has an activity that is being performed.

An event has an actor that handles the event.


Additionally, each such event may include a unique
identifier.

27/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Fig. 10: An example Event Log.

28/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Each event in an event log record details in an activity.


Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.
Each event has an activity that is being performed.
An event has an actor that handles the event.

Additionally, each such event may include a unique


identifier.

29/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining

Fig. 10: An example Event Log.

30/109

Vishleshan
Research Motivation and Aim
Process Mining

Process Mining
3 types of process mining techniques:
1. Process Discovery
2. Process Conformance
3. Process Enhancement

3 types of process mining perspectives:


1. Control Flow Perspective
2. Organizational Perspective
3. Case Perspective.
31/109

Vishleshan
Research Motivation and Aim
Process Mining

Similar Task Algorithm


Similar Task algorithm focuses on identifying actors
performing similar activities in the organizational
perspective.
It focuses on activities the actors perform irrespective of
cases.
It is based on the notion that people doing similar things
have a stronger relation than people doing different things.

32/109

Vishleshan
Research Motivation and Aim
Process Mining

Similar Task Algorithm

33/109

Case
Identifier

Activity
Identifier

Actor

Nidhi

Nidhi

Nidhi

Kunal

Kunal

Priyanka

Priyanka

Pooja

Pooja

Nidhi

Astha

Kunal

Priyanka

Pooja

Astha

Astha

Table 2: Sample Event Log

Table 3: Actor-Activity Matrix

Vishleshan
Research Motivation and Aim
Process Mining

Similar - Task Algorithm


Given two vectors of attributes, A and B, the Cosine-Similarity if
given by

34/109

Nidhi

Kunal

Priyanka

Pooja

Astha

Nidhi

---

0.32

0.00

0.63

0.00

Kunal

0.32

---

0.00

0.00

0.70

Priyanka

0.00

0.00

---

0.70

0.00

Pooja

0.63

0.00

0.70

---

0.00

Astha

0.00

0.70

0.00

0.00

---

Table 4: Cosine Similarity Values

Figure taken from [21].

Vishleshan
Research Motivation and Aim
Similar - Task Algorithm at a glance!

Similar Task Algorithm at a glance!

35/109

Vishleshan
Research Motivation and Aim
Process Mining

Sub Contract Algorithm


Sub Contract algorithm focuses on how work moves among
performers.
The main idea is to count the number of times individual j
performs an activity in between two activities performed by
individual i.
The relation between individuals are case dependent.

36/109

Vishleshan
Research Motivation and Aim
Process Mining

Sub Contract Algorithm


Case
Identifier

Activity
Identifier

Actor

Case
Identifier

Activity
Identifier

Actor

Nidhi

Nidhi

Nidhi

Priyanka

Kunal

Priyanka

Astha

Pooja

Nidhi

Nidhi

Kunal

Kunal

Pooja

Priyanka

Astha

Pooja

Pooja

Astha

Kunal

Astha

Priyanka

Table 5: Sample Event Log

37/109

Zoom Shape
1 Nidhi
C

Table 6: Organized Event Log

38/109

39/109

Vishleshan
Research Motivation and Aim
Process Mining

Sub Contract Algorithm


Case
Identifier

Activity
Identifier

Actor

Case
Identifier

Activity
Identifier

Actor

Nidhi

Nidhi

Nidhi

Priyanka

Kunal

Nidhi

Priyanka

Astha

Pooja

Nidhi

Nidhi

Kunal

Kunal

Pooja

Priyanka

Astha

Pooja

Pooja

Astha

Kunal

Astha

Priyanka

Table 5: Sample Event Log

40/109

Table 6: Organized Event Log

Vishleshan
Research Motivation and Aim
Process Mining

Sub - Contract Algorithm


normal = 4.0
Nidhi

Kunal

Priyanka

Zoom Shape 1

Pooja

Astha

Nidhi

Kunal

Priyanka

Pooja

Astha

0.00

1.00

0.00

0.00

Nidhi

0.00

0.00

0.25

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Kunal

0.00

0.00

0.00

0.00

0.00

Priyanka

0.00

0.00

0.00

0.00

0.00

Priyanka

0.00

0.00

0.00

0.00

0.00

Pooja

0.00

0.00

0.00

0.00

0.00

Pooja

0.00

0.00

0.00

0.00

0.00

Astha

0.00

0.00

0.00

0.00

0.00

Astha

0.00

0.00

0.00

0.00

0.00

Nidhi

0.00

Kunal

Table 7: Sub Contraction Values


before Normalization

41/109

Table 8: Sub Contraction Values


after Normalization

42/109

43/109

Vishleshan
Research Motivation and Aim
Process Mining

Sub - Contract Algorithm


normal = 4.0
Nidhi

Kunal

Priyanka

Pooja

Astha

Nidhi

0.00

0.00

0.25

0.00

0.00

0.00

Kunal

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Priyanka

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Pooja

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Astha

0.00

0.00

0.00

0.00

0.00

Nidhi

Kunal

Priyanka

Pooja

Astha

Nidhi

0.00

0.00

1.00

0.00

0.00

Kunal

0.00

0.00

0.00

0.00

Priyanka

0.00

0.00

0.00

Pooja

0.00

0.00

Astha

0.00

0.00

Table 7: Sub Contraction Values


before Normalization

44/109

Table 8: Sub Contraction Values


after Normalization

45/109

46/109

Vishleshan
Research Motivation and Aim
Process Mining

Sub - Contract Algorithm


normal = 4.0
Nidhi

Kunal

Priyanka

Pooja

Astha

Nidhi

0.00

0.00

0.25

0.00

0.00

0.00

Kunal

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Priyanka

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Pooja

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Astha

0.00

0.00

0.00

0.00

0.00

Nidhi

Kunal

Priyanka

Pooja

Astha

Nidhi

0.00

0.00

1.00

0.00

0.00

Kunal

0.00

0.00

0.00

0.00

Priyanka

0.00

0.00

0.00

Pooja

0.00

0.00

Astha

0.00

0.00

Table 7: Sub Contraction Values


before Normalization

47/109

Table 8: Sub Contraction Values


after Normalization

Vishleshan
Research Motivation and Aim
Sub - Contract Algorithm at a glance!

Sub Contract Algorithm at a glance I

48/109

Vishleshan
Research Motivation and Aim
Sub - Contract Algorithm at a glance!

Sub Contract Algorithm at a glance II

49/109

Vishleshan
Research Motivation and Aim

Research Motivation and Aim


Query languages provide the most standard way to
interact with the database.
We, try to implement process mining algorithm using
database query languages to the extent possible so that
our application is tightly coupled to the database.
Our work lies at the intersection of Process Mining and
NoSQL databases.

50/109

Vishleshan
Research Motivation and Aim
Research Aim .

Research Aim

To investigate the intersection of Process Mining and Graph Database(s) for


detecting social, hierarchical structures.

To understand application needs that can be modelled into this new domain.

To implement Similar-Task algorithm and Sub-Contract algorithm in row-oriented


database, MySQL.

To implement Similar-Task algorithm and Sub-Contract algorithm in graph


oriented database, Neo4j.

To compare performance of Similar-Task algorithm and Sub-Contract Algorithm in


MySQL and Neo4j.

51/109

Vishleshan

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithm in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

52/109

Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Relational Databases.

Implementation of Mining Algorithms in Relational Databases


Ordonez et al. [5]
Implement k-means clustering algorithm in SQL.
Cluster large datasets in RDBMS.
Define suitable tables, index them and write suitable queries for
clustering purposes.

Ordonez et al. [6]


Extend own work in [5].
Efficient implementation of EM algorithm to perform clustering in
very large datasets.

53/109

Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Relational Databases

Implementation of Mining Algorithms in Relational Databases


Berzal et al. [7]
Implemented Tree Based Association Rule Mining to discover
interesting patterns in relational databases.

Sattler et al. [8]


Applied data mining techniques on a decision tree and classifier.
Tight coupling of data mining and database systems.

54/109

Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Graph Databases

Implementation of Mining Algorithms in Graph Databases


Wang et al. [9]
Studied structural pattern mining for large disk based graph
databases.
They presented a novel ADI index structure and efficient algorithms
for mining frequent pattern.

Wang et al. [10]


Presented techniques to obtain scalable mining in graph databases.

55/109

Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Graph Databases.

Implementation of Mining Algorithms in Graph Databases


Huan et al. [11]
Presented novel technique to mine maximal frequent sub-graph in
graph databases.

Ozaki et al. [12]


Came up with hyper-clique pattern in graph databases.
Used hyper-clique pattern to detect highly correlated sub-graphs.

56/109

Vishleshan
Related Work and Novel Research Contributions
Performance Comparison of Mining Algorithms in Relational and Graph Databases.

Performance Comparison of Mining Algorithms in Relational and Graph


Databases.
Vicknair et al. [13]
Performance comparison of Relational and Graph databases for
data provenance systems.

McColl et al. [14]


Evaluated performance of series of open-source graph databases.
Used various graph algorithms for a graph setup consisting of 256
million nodes.

57/109

Vishleshan
Related Work and Novel Research Contributions
Performance Comparison of Mining Algorithms in Relational and Graph Databases.

Performance Comparison of Mining Algorithms in Relational and Graph


Databases.
Ciglan et al. [15]
Benchmarked graph databases over graph traversal algorithms.

Macko et al. [16]


Presented a performance introspection framework for Graph
database, PIG.
PIG provided tools and mechanisms to understand performance of
graph database.

58/109

Vishleshan
Related Work and Novel Research Contributions
Novel Research Contributions.

Novel Research Contributions


While there has been work done in implementing data mining algorithms
in relational and graph databases, we are,
First to implement organizational mining algorithms (Similar-Task and
Sub-Contract) in row oriented database MySQL using SQL.
First to implement organizational mining algorithms (Similar-Task and
Sub-Contract) in graph oriented database Neo4j using CYPHER.
Performance Benchmarking of organizational mining algorithms
(Similar-Task and Sub-Contract) on MySQL and Neo4j.

59/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

60/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Steps
Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad
tasks

Declare and iterate cursor to select distinct tasks.


Create a table to store result.
Fetch actors vector and calculate Cosine Similarity.
Write results to the result table.

61/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Define and iterate cursor


Declare cursor to select distinct tasks from table

Open cursor. Loop through the results returned by the cursor.

62/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Declare table to store results


Dynamically create table with the specified table-name.

Prepare SQL statements from the query and execute it.

63/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Fetch actors vector and calculate Cosine-Similarity I.


Prepare query to insert into table

Define variables to store values for cosine-similarity calculation.

64/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Fetch actors vector and calculate Cosine-Similarity II.


Inside the cursor, collect distinct tasks from the tables for the required calculation.

65/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Fetch actors vector and calculate Cosine-Similarity III.


Append parts of cosine similarity calculation to the SQL query.

66/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Update Final Results I.


Declare a cursor to get all distinct teams.

Iterate through the cursor to get distinct teams

67/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Update Final Results II.


Form a query by for creating table and taking distinct teams as columns.

Inside the cursor loop, append distinct teams as columns of the table.

68/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar Task Algorithm

Update Final Results III.


Form a query for inserting values into the table (resultant table)

Inside the cursor loop, assign similarity values at the respective column (match teams).

69/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Steps
Sub-Contract Algorithm implementation can be studied under four (4) broad
categories:
Create table to store results.
Find distinct case identifiers.
Update normal and find sub-contraction within each case.
Normalize the result.

70/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Create table to store results I


Declare cursor to select distinct actors.

Iterate through the cursor to collect the distinct actors.

71/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Create table to store results II


Form a query to create a table.

Inside the cursor, append each distinct actor as part of the query.

72/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Find distinct case identifiers


Declare cursor to select distinct case identifiers with count >= 3

Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase.

73/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Update normal and find sub-contraction I.


Update normal.

74/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Update normal and find sub-contraction II.


Declare a cursor to find sub-contracting actors.

75/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Update normal and find sub-contraction III.


Iterate through the cursor to find IDs of actor

76/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Update normal and find sub-contraction IV.


Declare cursor to find sub-contracting actors.

Iterate through the cursor to find IDs of sub-contracting actors.

77/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Update normal and find sub-contraction V.


For any pair of sub-contracting actor, insert or update sub-contract value between them.

78/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.

Normalize the result.


Declare cursor to select distinct actors that formed columns of the result table

For each column, form an update query and normalize it by normal

79/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

80/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.

Steps
Implementation of Similar Task algorithm in CYPHER consists mainly of two (2)
broad functions.
Load data with Actor and activity nodes being unique.
Calculate Cosine-Similarity between actors.

81/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.

Load actor and activity node uniquely.


Load data directly from the data file. Make unique nodes for actor and activity.

82/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.

Calculate Cosine - Similarity.


Match common activities between actors and calculate similarity.

83/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.

Steps
Implementation of Sub Contract algorithm in CYPHER consists mainly of four (4)
broad functions.
Identify sub contracting actors within each case.
Collect unique names and make new nodes for each of them.
Set sub contraction strength between unique actor nodes.
Calculate normal and normalize the sub contraction value.

84/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.

Identify sub contracting actors.


Identify sub-contracting actors and connect then via [:RELATED_TO] relationship.

85/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.

Collect unique names and create unique actor nodes.


Collect unique actor names

Make new nodes, UNIQUEACTOR for each distinct actor names found.

86/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.

Set sub contraction strength between unique actors.


For all sub-contracting actor, determine strength of sub-contraction between the actors.

87/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.

Calculate normal and normalize the result.


Calculate normal.

Normalize the sub-contraction strength between actors.

88/109

Vishleshan
Experimental Dataset

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

89/109

Vishleshan
Experimental Dataset

Experimental Dataset.
We use Business Process Intelligence 2014 (BPI 2014)
dataset to conduct our experiments.
The log contains events from an incident and problem
management system of Rabobank Group ICT.
Contains data about managing requests from Rabobank
Group ICT.
Contains total 466737 records.

90/109

Vishleshan
Experimental Dataset

Dataset Details

Fig. 11: Sample Event Log from MySQL.


91/109

Vishleshan
Performance Comparison

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

92/109

Vishleshan
Performance Comparison
Similar Task Algorithm

Load Time
Dataset size

Load Time (msec)

MySQL

Neo4j

65,000

2467

3413

1,01,000

2875

3362

2,19,500

5966

4354

3,00,000

5850

5877

4,66,737

7819

6875

Table 9: Data Load Time

93/109

Fig 12: Load Time

Vishleshan
Performance Comparison
Similar Task Algorithm

Execution Time I
Dataset
Size

Execution Time (msec)


Step -8

Step -9

MySQL

Neo4j

MySQL

Neo4j

65,000

225

9616

2467

2403

1,01,000

372

11700

2875

2925

2,19,500

713

14655

5966

3664

3,00,000

903

29520

5850

7380

4,66,737

1403

48891

7819

12223

Table 10: Execution Time of Step-8 & Step-9

94/109

Vishleshan
Performance Comparison
Similar Task Algorithm

Execution Time II

Fig. 13: Execution Time of Step-8 & Step-9

95/109

Vishleshan
Performance Comparison
Similar Task Algorithm

Disk Usage in MySQL I


Tables

Dataset Size
65000

101000

219500

300000

466737

Dataset

3686400

5783552

11026432

15220736

21544960

OTMatrix

65536

65536

65536

81920

81920

InitSim

1589248

1589248

1589248

3686400

3686400

FinalSim

229376

262144

278528

491520

1589248

Table 11: Disk Space Usage in MySQL.

96/109

Vishleshan
Performance Comparison
Similar Task Algorithm

Disk Usage in MySQL II

Fig 14: Disk Space Usage in MySQL.

97/109

Vishleshan
Performance Comparison
Similar Task Algorithm

Disk Usage in Neo4j I


Graph
Elements

Dataset Size
65000

101000

219500

300000

466737

Nodes

2820

2910

3075

3990

4215

Relationships

770040

414315

479663

8568809

983227

Properties

1033856

563873

651203

1155011

1323439

Table 12: Disk Space Usage in Neo4j.

98/109

Vishleshan
Performance Comparison
Similar Task Algorithm

Disk Usage in Neo4j II

Fig. 14: Disk Space Usage in Neo4j.

99/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Load Time
Dataset size

Load Time (msec)

MySQL

Neo4j

65,000

6575

9567

1,01,000

8390

10476

2,19,500

14279

14873

3,00,000

26437

25435

4,66,737

43712

38234

Table 13: Load Time

100/109

Fig 15: Load Time

Vishleshan
Performance Comparison
Sub Contract Algorithm

Execution Time in MySQL I


Dataset Size

Execution Time (msec)


Update
Normal

Sub-Contract
Detection

Update
Result

Normalize
result

65,000

32

11712

8296

16

1,01,000

32

11782

8138

16

2,19,500

35

11713

7940

17

3,00,000

70

11,736

8094

17

4,66,737

73

11747

7754

20

Table 14: Execution Time for 4 main steps in MySQL.

101/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Execution Time in MySQL II

Fig 16: Execution Time for 4 main steps in MySQL.

102/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Execution Time in Neo4j I


Dataset Size

Execution Time (msec)


Update
Normal

Sub-Contract
Detection

Update
Result

Normalize
result

65,000

118

1542

2077

1,01,000

140

1707

2773

2,19,500

202

2534

2369

3,00,000

336

3442

5261

4,66,737

560

4149

5334

Table 15: Execution Time for 4 main steps in Neo4j

103/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Execution Time in Neo4j II

Fig. 17: Execution Time for 4 main steps in Neo4j.

104/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Disk Space Usage in MySQL I


Tables

Dataset Size
65000

101000

219500

300000

466737

Dataset

4734976

6832128

13123584

18366464

27836416

OrganisedData

4734976

6832128

13123584

18366464

27836416

ResultMatrix

1589248

1589248

1589248

1589248

1589248

Table 15: Disk Space Usage in MySQL

105/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Disk Space Usage in MySQL II

Fig 17: Disk Space Usage in MySQL

106/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Disk Space Usage in Neo4j I


Graph
Elements

Dataset Size
65000

101000

219500

300000

466737

982212

1523732

3360798

4598454

7190330

Relationships 153477921

183955761

285778449

375437997

490033038

Properties

461537287

719874720

942665404

1238579332

Nodes

384189475

Table 16: Disk Space Usage for graph elements in Neo4j.

107/109

Vishleshan
Performance Comparison
Sub Contract Algorithm

Disk Space Usage in Neo4j II

Fig. 18: Disk Space Usage for graph elements in Neo4j.

108/109

Vishleshan
Conclusion

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

109/109

Vishleshan
Conclusion
.

Conclusion
Neo4j performs better when it comes to loading data.

Read operations in MySQL are comparatively faster for a


single node setup.
Neo4j gives much improved performance whenever
relationships are of prime importance.
Writes performance varied greatly for both cases. For
smaller dataset, MySQL performs better whereas for larger
dataset, Neo4j gives improved performance.

110/109

Vishleshan
Limitations

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations and Future work

9.

References

111/109

Vishleshan
Limitations and Future Work
.

Limitations

Limitations
Different sizes of single dataset was used.
Single node setup of databases were used.
Metrics used for organizational mining were only two in
number.

112/109

Vishleshan
Limitations and Future Work.
Future Work

Future Work
To apply the algorithm over larger data sets.
Create a multi-node Neo4j setup and implement the
algorithms on it.
Implement and study impact of process enhancement and
recommendation systems.
Experiment with more relational and graph oriented
databases.

113/109

Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

Presentation Outline
1.

Research Motivation and Aim

2.

Related Work and Novel Research Contributions

3.

Implementation of Similar-Task and Sub-Contract Algorithms in SQL,


RDBMS

4.

Implementation of Similar-Task and Sub-Contract Algorithm in


CYPHER, Graph Oriented

5.

Experimental Dataset

6.

Performance Comparison

7.

Conclusion

8.

Limitations

9.

References

114/109

Vishleshan
References

References I
WIL VAN DER AALST.
Process Mining: Overview and Opportunities.
ACM, 2012. vi, 2, 11
P Neubauer.
Graph databases, NOSQL and Neo4j?
www.infoq.com.
I Robinson, J Webber, E Eifrem.
Graph Databases
www.books.google.com.
Minseok Song, WIL M. P. Van Der Aalst.
Towards comprehensive support for organizational mining.
Elsevier, 2008.
115/109

Vishleshan
References

References II
Carlos Ordonez.
Programming the K-means clustering algorithm in SQL
C. Ordonez and P. Cereghini.
SQLEM: fast clustering in SQL using the EM algorithm.
International Conference on Management of Data
Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero.
TBRAR: An ecient method for association rule mining in relational databases.
Elsevier, 2001.
K-U.Sattler and O.Dunemann.
SQL Database Primitives for Decision Tree Classiers.
Conference on Information and Knowledge Management, 2001.
116/109

Vishleshan
References

References III
W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan.
Graphminer: a structural pattern mining system for large disk based graph
databases and its applications.
ACM, 2005.
C Wang, W Wang, Y Zhu, B Shi, J Pei.
Scalable Mining of large disk based graph databases.
ACM, 2004.

J Huan, W Wang, J Prins.


SPIN: mining maximal frequent subgraphs from graph databases.
ACM, 2004.
T Ozaki, T Okhwaha.
Mining correlated subgraphs in graph databases.
Advancement in Knowledge Discovery and Data Mining, 2008.
117/109

Vishleshan
References

References IV
C Vicknair, M Macais, Z Zhao, X Nan, Y Chen.
A comparison of graph databases and a relational database: a data provenance
perspective
ACM, 2010.
RC McColl, R Ediger, J Poovey, D Campbell.
A performance evaluation of open-source graph databases.
ACM, 2014.

M Ciglan, A Averbuch, L Hluchy


Benchmarking graph traversal operations over graph databases.
IEEE, 2012.
P Macko, D Margo, M Seltzer.
Performance introspection of graph databases
ACM, 2013.
118/109

Vishleshan
References

References V
Why NOSQL?
Couchbase.
Scale-out vs. Scale-up.
www.natishalom.typepad.com.

Introduction to Graph Databases and Neo4j.


www.neo4j.com
From Relational to Neo4j.
www.neo4j.com
Cosine- Similarity
www.Wikipedia.com
119/109

Вам также может понравиться