Академический Документы
Профессиональный Документы
Культура Документы
www.software-analytics.in
1/109
2/109
Vishleshan
Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
3/109
Vishleshan
Research Motivation and Aim
Presentation Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
4/109
Vishleshan
Research Motivation and Aim
Introduction to NoSQL
Why NoSQL?
Global population accessing internet has increased
tremendously.
Most applications are hosted on the cloud and need to
support users 24 hours a day, 365 days a year.
5/109
Figure taken from [17]
Vishleshan
Research Motivation and Aim
Introduction to NoSQL
Why NoSQL?
Data is captured in huge volumes and consists of both
structured and unstructured data.
Amount of data is growing rapidly and nature of data is
growing as well.
6/109
Figure taken from [17]
Vishleshan
Research Motivation and Aim
Introduction to NoSQL
Why NoSQL?
What is wrong with relational databases?
Nothing!
Relational Databases employ one size fits all philosophy
for storage.
Relational Databases are used when strong consistency is a
must.
Relational Databases can create problem when its time to
scale.
7/109
Vishleshan
Research Motivation and Aim
Introduction to NoSQL
Why NoSQL?
Explosion of social media sites like Facebook, Twitter with large data
needs.
They had to capture and deal with very large volumes of data in a way
which was difficult to deal with traditional RDBMS.
Traditional databases are designed to scale up. We required a database
that can scale out.
When relational applications become successful, usage goes up. Joins
are inherent in RDBMS and become very slow!
Application developers find it difficult to get the dynamic scalability
they need while maintaining the performance users demand.
8/109
Vishleshan
Research Motivation and Aim
Introduction to NoSQL
Why NoSQL?
We require a technology that scales out rather than scaling up!
Scale Up- Add more processor, memory.
Scale Out- Add more servers.
9/109
Vishleshan
Research Motivation and Aim
Introduction To NoSQL
NoSQL Database.
Hence, NoSQL databases were introduced:
10/109
Vishleshan
Research Motivation and Aim
Introduction to NoSQL
11/109
Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database
Recor
d No
Name
Address
City
State
01
Jeevan Joishi
Uniworld Apartment
Bangalore
Karnataka
02
Kunal Gupta
Kanpur
Uttar
Pradesh
03
Priyanka Verma
Sector-7
Jind
Haryana
04
Nidhi Agarwal
JJ colony
Bhiwani
Haryana
12/109
Figure taken from [19]
Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database
13/109
Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database
14/109
Figure taken from [20]
Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database
15/109
Figure taken from [20]
Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database
16/109
Figure taken from [20]
Vishleshan
Research Motivation and Aim
Row Oriented vs. Graph Oriented Database
17/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
Process Mining is analysing a process using event log data.
One of the key aspects is to study the social structure of the
organization using event logs.
18/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
Process Mining focuses on the analysis of process using
the data present in event logs.
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.
Each event has an activity that is being performed.
19/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
20/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
Each event in an event log record details in an activity.
21/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
22/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).
23/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
24/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
Each event in an event log record details in an activity.
Each event is associated with Case Identifiers (CaseID).
Each event has a timestamp.
25/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
26/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
27/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
28/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
29/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
30/109
Vishleshan
Research Motivation and Aim
Process Mining
Process Mining
3 types of process mining techniques:
1. Process Discovery
2. Process Conformance
3. Process Enhancement
Vishleshan
Research Motivation and Aim
Process Mining
32/109
Vishleshan
Research Motivation and Aim
Process Mining
33/109
Case
Identifier
Activity
Identifier
Actor
Nidhi
Nidhi
Nidhi
Kunal
Kunal
Priyanka
Priyanka
Pooja
Pooja
Nidhi
Astha
Kunal
Priyanka
Pooja
Astha
Astha
Vishleshan
Research Motivation and Aim
Process Mining
34/109
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
---
0.32
0.00
0.63
0.00
Kunal
0.32
---
0.00
0.00
0.70
Priyanka
0.00
0.00
---
0.70
0.00
Pooja
0.63
0.00
0.70
---
0.00
Astha
0.00
0.70
0.00
0.00
---
Vishleshan
Research Motivation and Aim
Similar - Task Algorithm at a glance!
35/109
Vishleshan
Research Motivation and Aim
Process Mining
36/109
Vishleshan
Research Motivation and Aim
Process Mining
Activity
Identifier
Actor
Case
Identifier
Activity
Identifier
Actor
Nidhi
Nidhi
Nidhi
Priyanka
Kunal
Priyanka
Astha
Pooja
Nidhi
Nidhi
Kunal
Kunal
Pooja
Priyanka
Astha
Pooja
Pooja
Astha
Kunal
Astha
Priyanka
37/109
Zoom Shape
1 Nidhi
C
38/109
39/109
Vishleshan
Research Motivation and Aim
Process Mining
Activity
Identifier
Actor
Case
Identifier
Activity
Identifier
Actor
Nidhi
Nidhi
Nidhi
Priyanka
Kunal
Nidhi
Priyanka
Astha
Pooja
Nidhi
Nidhi
Kunal
Kunal
Pooja
Priyanka
Astha
Pooja
Pooja
Astha
Kunal
Astha
Priyanka
40/109
Vishleshan
Research Motivation and Aim
Process Mining
Kunal
Priyanka
Zoom Shape 1
Pooja
Astha
Nidhi
Kunal
Priyanka
Pooja
Astha
0.00
1.00
0.00
0.00
Nidhi
0.00
0.00
0.25
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Nidhi
0.00
Kunal
41/109
42/109
43/109
Vishleshan
Research Motivation and Aim
Process Mining
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
0.25
0.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
1.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
Pooja
0.00
0.00
Astha
0.00
0.00
44/109
45/109
46/109
Vishleshan
Research Motivation and Aim
Process Mining
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
0.25
0.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Pooja
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Astha
0.00
0.00
0.00
0.00
0.00
Nidhi
Kunal
Priyanka
Pooja
Astha
Nidhi
0.00
0.00
1.00
0.00
0.00
Kunal
0.00
0.00
0.00
0.00
Priyanka
0.00
0.00
0.00
Pooja
0.00
0.00
Astha
0.00
0.00
47/109
Vishleshan
Research Motivation and Aim
Sub - Contract Algorithm at a glance!
48/109
Vishleshan
Research Motivation and Aim
Sub - Contract Algorithm at a glance!
49/109
Vishleshan
Research Motivation and Aim
50/109
Vishleshan
Research Motivation and Aim
Research Aim .
Research Aim
To understand application needs that can be modelled into this new domain.
51/109
Vishleshan
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
52/109
Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Relational Databases.
53/109
Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Relational Databases
54/109
Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Graph Databases
55/109
Vishleshan
Related Work and Novel Research Contributions
Implementation of Mining Algorithms in Graph Databases.
56/109
Vishleshan
Related Work and Novel Research Contributions
Performance Comparison of Mining Algorithms in Relational and Graph Databases.
57/109
Vishleshan
Related Work and Novel Research Contributions
Performance Comparison of Mining Algorithms in Relational and Graph Databases.
58/109
Vishleshan
Related Work and Novel Research Contributions
Novel Research Contributions.
59/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
60/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Steps
Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad
tasks
61/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
62/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
63/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
64/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
65/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
66/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
67/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Inside the cursor loop, append distinct teams as columns of the table.
68/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Inside the cursor loop, assign similarity values at the respective column (match teams).
69/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
Steps
Sub-Contract Algorithm implementation can be studied under four (4) broad
categories:
Create table to store results.
Find distinct case identifiers.
Update normal and find sub-contraction within each case.
Normalize the result.
70/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
71/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
Inside the cursor, append each distinct actor as part of the query.
72/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase.
73/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
74/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
75/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
76/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
77/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
78/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS
Sub - Contract Algorithm.
79/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
80/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.
Steps
Implementation of Similar Task algorithm in CYPHER consists mainly of two (2)
broad functions.
Load data with Actor and activity nodes being unique.
Calculate Cosine-Similarity between actors.
81/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.
82/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Similar Task Algorithm.
83/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.
Steps
Implementation of Sub Contract algorithm in CYPHER consists mainly of four (4)
broad functions.
Identify sub contracting actors within each case.
Collect unique names and make new nodes for each of them.
Set sub contraction strength between unique actor nodes.
Calculate normal and normalize the sub contraction value.
84/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.
85/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.
Make new nodes, UNIQUEACTOR for each distinct actor names found.
86/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.
87/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.
Sub Contract Algorithm.
88/109
Vishleshan
Experimental Dataset
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
89/109
Vishleshan
Experimental Dataset
Experimental Dataset.
We use Business Process Intelligence 2014 (BPI 2014)
dataset to conduct our experiments.
The log contains events from an incident and problem
management system of Rabobank Group ICT.
Contains data about managing requests from Rabobank
Group ICT.
Contains total 466737 records.
90/109
Vishleshan
Experimental Dataset
Dataset Details
Vishleshan
Performance Comparison
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
92/109
Vishleshan
Performance Comparison
Similar Task Algorithm
Load Time
Dataset size
MySQL
Neo4j
65,000
2467
3413
1,01,000
2875
3362
2,19,500
5966
4354
3,00,000
5850
5877
4,66,737
7819
6875
93/109
Vishleshan
Performance Comparison
Similar Task Algorithm
Execution Time I
Dataset
Size
Step -9
MySQL
Neo4j
MySQL
Neo4j
65,000
225
9616
2467
2403
1,01,000
372
11700
2875
2925
2,19,500
713
14655
5966
3664
3,00,000
903
29520
5850
7380
4,66,737
1403
48891
7819
12223
94/109
Vishleshan
Performance Comparison
Similar Task Algorithm
Execution Time II
95/109
Vishleshan
Performance Comparison
Similar Task Algorithm
Dataset Size
65000
101000
219500
300000
466737
Dataset
3686400
5783552
11026432
15220736
21544960
OTMatrix
65536
65536
65536
81920
81920
InitSim
1589248
1589248
1589248
3686400
3686400
FinalSim
229376
262144
278528
491520
1589248
96/109
Vishleshan
Performance Comparison
Similar Task Algorithm
97/109
Vishleshan
Performance Comparison
Similar Task Algorithm
Dataset Size
65000
101000
219500
300000
466737
Nodes
2820
2910
3075
3990
4215
Relationships
770040
414315
479663
8568809
983227
Properties
1033856
563873
651203
1155011
1323439
98/109
Vishleshan
Performance Comparison
Similar Task Algorithm
99/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
Load Time
Dataset size
MySQL
Neo4j
65,000
6575
9567
1,01,000
8390
10476
2,19,500
14279
14873
3,00,000
26437
25435
4,66,737
43712
38234
100/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
Sub-Contract
Detection
Update
Result
Normalize
result
65,000
32
11712
8296
16
1,01,000
32
11782
8138
16
2,19,500
35
11713
7940
17
3,00,000
70
11,736
8094
17
4,66,737
73
11747
7754
20
101/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
102/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
Sub-Contract
Detection
Update
Result
Normalize
result
65,000
118
1542
2077
1,01,000
140
1707
2773
2,19,500
202
2534
2369
3,00,000
336
3442
5261
4,66,737
560
4149
5334
103/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
104/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
Dataset Size
65000
101000
219500
300000
466737
Dataset
4734976
6832128
13123584
18366464
27836416
OrganisedData
4734976
6832128
13123584
18366464
27836416
ResultMatrix
1589248
1589248
1589248
1589248
1589248
105/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
106/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
Dataset Size
65000
101000
219500
300000
466737
982212
1523732
3360798
4598454
7190330
Relationships 153477921
183955761
285778449
375437997
490033038
Properties
461537287
719874720
942665404
1238579332
Nodes
384189475
107/109
Vishleshan
Performance Comparison
Sub Contract Algorithm
108/109
Vishleshan
Conclusion
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
109/109
Vishleshan
Conclusion
.
Conclusion
Neo4j performs better when it comes to loading data.
110/109
Vishleshan
Limitations
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
9.
References
111/109
Vishleshan
Limitations and Future Work
.
Limitations
Limitations
Different sizes of single dataset was used.
Single node setup of databases were used.
Metrics used for organizational mining were only two in
number.
112/109
Vishleshan
Limitations and Future Work.
Future Work
Future Work
To apply the algorithm over larger data sets.
Create a multi-node Neo4j setup and implement the
algorithms on it.
Implement and study impact of process enhancement and
recommendation systems.
Experiment with more relational and graph oriented
databases.
113/109
Vishleshan
Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented
Presentation Outline
1.
2.
3.
4.
5.
Experimental Dataset
6.
Performance Comparison
7.
Conclusion
8.
Limitations
9.
References
114/109
Vishleshan
References
References I
WIL VAN DER AALST.
Process Mining: Overview and Opportunities.
ACM, 2012. vi, 2, 11
P Neubauer.
Graph databases, NOSQL and Neo4j?
www.infoq.com.
I Robinson, J Webber, E Eifrem.
Graph Databases
www.books.google.com.
Minseok Song, WIL M. P. Van Der Aalst.
Towards comprehensive support for organizational mining.
Elsevier, 2008.
115/109
Vishleshan
References
References II
Carlos Ordonez.
Programming the K-means clustering algorithm in SQL
C. Ordonez and P. Cereghini.
SQLEM: fast clustering in SQL using the EM algorithm.
International Conference on Management of Data
Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero.
TBRAR: An ecient method for association rule mining in relational databases.
Elsevier, 2001.
K-U.Sattler and O.Dunemann.
SQL Database Primitives for Decision Tree Classiers.
Conference on Information and Knowledge Management, 2001.
116/109
Vishleshan
References
References III
W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan.
Graphminer: a structural pattern mining system for large disk based graph
databases and its applications.
ACM, 2005.
C Wang, W Wang, Y Zhu, B Shi, J Pei.
Scalable Mining of large disk based graph databases.
ACM, 2004.
Vishleshan
References
References IV
C Vicknair, M Macais, Z Zhao, X Nan, Y Chen.
A comparison of graph databases and a relational database: a data provenance
perspective
ACM, 2010.
RC McColl, R Ediger, J Poovey, D Campbell.
A performance evaluation of open-source graph databases.
ACM, 2014.
Vishleshan
References
References V
Why NOSQL?
Couchbase.
Scale-out vs. Scale-up.
www.natishalom.typepad.com.