Вы находитесь на странице: 1из 5

An effective Framework for Skyline Queries using

PCA
College Name
Dept. of Computer Science & Engineering Dept. of Computer Science & Engineering
RGPV University, Bhopal RGPV University, Bhopal
email email

Abstract— The Skyline operators are to let the users know to When exploring unfamiliar data, the skyline operator [1] can
expand the database system using Skyline Operations. Using identify balances among multiple (possibly conflicting)
Skyline Operators all the interesting points from a set of data attributes.
points can be filtered out. The existing SKY-MR+ algorithm is In our example the query retrieves the set of interesting
a very effective framework implemented for Skyline restaurant and it belongs to a novel type of queries. This paper
Operators and queries which uses quadtree-based histogram, gives the theory of spatial skyline queries (SSQ) for the first
but the algorithm provides inconsistent execution time time. For data points P and query points Q in the d
especially for High Datasets, along with that the increase of dimensional space SSQ gives those points of P which are not
number of machines in the system there is decreases in the present in any other point P if set of special derived attributes
relative speed up. Hence an effective framework is is considered. For each data points these attributes
maintains a distance to Q. An
interesting variation is
implemented which reduces the execution time for High
datasets and with the number of machines increases relative
speed increases. The proposed methodology implemented is also studied in this paper; the study gives the
very much similar to the architecture followed in present location where the domination is determined with
technique, but to find the “Points in Region” Principle respect to both spatial and non-spatial attributes of P.
Component Analysis technique is applied. Spatial skyline queries is very critical for many
The methodology proposed in this paper is for the efficient
applications along with online map service and group
processing of Skyline Queries on various Synthetic Dataset.
The implementation is done by Principal Component Analysis navigation/planning. In the trip planning attribute hotels
and in the basis of performance comparison between existing for a fixed location considering conference venue, beach,
work and the proposed methodology; it provides efficient and museums. The SSQ gives all the interesting hotels
processing of Relative Speed along with that there is for lodging during a pleasure trip/business trip and the
considerable decrease in Execution time on the basis of hotels obtained gives the closest location comparing all
number of sampled points and number of dimensions.
other results.
I. INTRODUCTION . In crisis management domain, the residential buildings that
must be evacuated first in the event of several explosions/fires
Skyline queries had been used various multi-criteria decision
are those which are in the spatial skyline with respect to the
support applications for pas twenty years. For the given
fire locations. The reason is that these places are either
dominance relationship in a dataset, skyline queries returns
potentially trapped in the convex hull of fires or located at the
those objects that cannot be dominated by any other objects.
edges of the expanding fire. In defense and intelligence
Skyline queries are being studied in multidimensional spaces,
applications, consider the locations of soldiers penetrating into
in subspaces, in metric spaces, in dynamic spaces, in steaming
enemy’s camps as query locations and the enemy’s guard
environment and in time series data extensively. Various
stations as data points. The stations in the spatial skyline are
algorithms for skyline query processing were proposed for
those from which an attack might be initiated against the
example window based, progressive, distributed, geometric-
platoon of soldiers. Since the introduction of the skyline
based, index based, divide and conquer and dynamic
operator by B ̈orzs ̈onyi et al. [2], several efficient algorithms
programming algorithms. Along with that several variations
have been proposed for the general skyline query. These
are being proposed to solve application specified problems
algorithms utilize techniques such as divide-and-conquer [2],
such as k- dominant skylines, top-k dominant queries, spacial
nearest neighbor search [5], sorting [3], and index structures
skyline queries and others. As the number of objects that are
[2, 8, 7] to answer the general skyline queries. Several studies
returned in a skyline query is increasing to larger amount,
have also focused on the skyline query processing in a variety
there are also studies going on to check the cardinality of
of problem settings such as data streams [6] and data residing
skyline queries. These researches indicate the importance of
on mobile devices [4].
skyline queries and their variations in modern applications.
where a denotes the column vector with entries α1, α2…..m
αM, To find the solutions of previous equation , one solves
II. PROPOSED METHODOLOGY Mα = Kα
In the Existing technique proposed the efficient parallel
algorithm SKY-MR+ for processing skyline queries using PROPOSED ALGORITHM
MapReduce. Here they first build a quadtree-based histogram 1. Input (D, ϭ, m, ∂)
for space partitioning by deciding whether to split each leaf 2. Given a ‘D’ dataset of ‘d’ dimensional dataset of sample
node judiciously based on the benefit of splitting in terms of size ‘ϭ’.
the estimated execution time. In addition, apply the dominance 3. ‘m’ denotes as number of machines, ‘∂’ a size of threshold
power filtering method to effectively prune non-skyline points value.
in advance. Then next partition data based on the regions 4. The algorithm initiates with the sampling of dataset based 𝑆𝐿
divided by the quadtree and compute candidate skyline points = ∀(𝐿𝑜
on the size of samples in dataset.
for each partition using MapReduce. Finally, check whether 𝑆 = 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔( ϭ, 𝐷)
each skyline candidate point is actually a skyline point in 5. The sampled dataset is then passed as an input to the Sky-
every partition using MapReduce. They also develop the QTree+ based on the number of machines.
workload balancing methods to make the estimated execution 𝑄 = 𝑆𝐾𝑌 − 𝑄𝑇𝑟𝑒𝑒 + (𝑆, 𝑚)
times of all available machines to be similar. They did 6. The SKY-Tree+ algorithm is then load balanced using
experiments to compare SKY-MR+ with the state-of-the-art local load balancing and number of machines and Sampled
algorithms using MapReduce and confirmed the effectiveness data including to apply skewness algorithm to minimize
as well as the scalability of SKY-MR+. the chances of overloading.
The proposed methodology implemented is similar to the 𝐴𝐿 = 𝐿𝑜𝑐𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝑆, 𝑚)
architecture followed in existing technique, but to find the 7. Broadcast Q and 𝐴𝐿
“Points in Region” Principle Component Analysis technique is 8. Applying Principal Component Analysis on the
applied. input Dataset and the broadcast Q and 𝐴𝐿
PRINCIPLE COMPONENT ANALYSIS (𝐿𝑜𝑐𝑎𝑙𝑆𝐿, 𝑉𝑚𝑎𝑥, 𝐹𝐼𝐿𝑇𝐸𝑅, 𝐶𝑂𝑈𝑁𝑇)
The feature space is calculated in the following way. Given a = 𝑅𝑢𝑛𝑃𝐶𝐴(𝑏𝑜𝑟𝑎𝑑𝑐𝑎𝑠𝑡(𝑄, 𝐴𝐿 ), 𝐷)
set of centered observations (∑𝑀 𝑖=1 Xi = 0, Xk ), where 9. If LocalSL.totalSize < threshold (∂)
k=1...M, the traditional way of formulating the covariance 10. 𝑆𝐿 = ∀(𝐿𝑜𝑐𝑎𝑙𝑆𝐿, 𝑉𝑀𝑎𝑥, 𝐹𝐼𝐿𝑇𝐸𝑅)
matrix using PCA is 11. Else
12. 𝐴𝐺 = 𝐺𝑙𝑜𝑏𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝐶𝑂𝑈𝑁𝑇, 𝑚)
M
13. Broadcast Q, VMax, FILTER and Ag
𝐶 = 1/𝑀 ∑ 𝑋𝑗 Xjt 14. 𝑆𝐿 = 𝑅𝑢𝑛𝑃𝐶𝐴(∀+, 𝐿𝑜𝑐𝑎𝑙𝑆𝑙)
𝑗=1 15. 𝑟𝑒𝑡𝑢𝑟𝑛 𝑆𝐿
Now the nonlinear feature space F must be defined. F is D: Dataset of d dimension
related to the input space by a possibly nonlinear map ϭ: sample size of dataset
 : RN  F m: number of machines
The covariance matrix in F can now be defined as ∂: threshold value
M

𝐶′ = 1/𝑀 ∑ (𝑋𝑗)( Xjt )


𝑗=1
We then determine each Eigen value  and corresponding 𝑆 = 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔( ϭ, 𝐷)
Eigen vector V and C’ which satisfy
V= C’V
All situation of V with   0 lie in the span  (X1), 𝑄 = 𝑆𝐾𝑌 − 𝑄𝑇𝑟𝑒𝑒 + (𝑆, 𝑚)
…………….,  (XM), there also exists coefficient αi such that
M
Skewness 𝐴𝐿 = 𝐿𝑜𝑐𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝑆, 𝑚)
𝑉 = ∑ αi(𝑋𝑖) Algorithm
𝑖=1
The M X M Kernel matrix Kij, where j= 1………….M, de
fined with Broadcast Q and 𝐴𝐿
Kij = ( (Xi) .  (Xj))
(𝐿𝑜𝑐𝑎𝑙𝑆𝐿, 𝑉𝑚𝑎𝑥, 𝐹𝐼𝐿𝑇𝐸𝑅, 𝐶𝑂𝑈𝑁𝑇)
where • denotes dot product. Thus the KPCA problem is
determining k to satisfy = 𝑅𝑢𝑛𝑃𝐶𝐴(𝑏𝑜𝑟𝑎𝑑𝑐𝑎𝑠𝑡(𝑄, 𝐴𝐿 ), 𝐷)
MKα = K.Kα

If
LocalSL.total
Size <
threshold (∂)
The below table shows the analysis of Execution time in Sec
No for ANTI Dataset, the analysis is done on various dimensions
from 2 to 12 and the existing and proposed algorithm is
applied over these dimensions and the proposed algorithm
provides efficient and less execution time in comparison with
existing Sky-MR+ algorithm.
Yes
Execution Time (Sec) on ANTI
# of Dimensions SKY-MR+ Proposed Work
𝐴𝐺 = 𝐺𝑙𝑜𝑏𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝐶𝑂𝑈𝑁𝑇, 𝑚) 2 100 80
4 250 160

Broadcast Q, Vmax, Filter and AG 6 380 300


8 550 450
10 1000 700
(𝑆𝐿) = 𝑅𝑢𝑛𝑃𝐶𝐴(𝑏𝑜𝑟𝑎𝑑𝑐𝑎𝑠𝑡(𝑄, 𝐴𝐿 ), 𝐷) 12 5000 3500
Table 2 Analysis of Execution Time on ANTI
The below table shows the analysis of Relative Speed up, the
analysis is done on various number of machines from 10 to 40
III. EXPERIMENTAL SETUP and the existing and proposed algorithm is applied over these
In this section, we present performance results and machines and the proposed algorithm provides efficient and
high relative speed up in comparison with existing Sky-MR+
identify bottlenecks involved in processing large algorithm.
input datasets. To measure the performance of
Relative Speed up
skyline Queries, we are using system with intel core
# of Machines SKY-MR+ Proposed Work
i3 processor and 4GB RAM. the operators on
10 1 1.3
simulation environment on java JDK 1.8 environment
15 1.5 1.8
and Net Beans 7.4. 20 2 2.2
25 2.5 2.9
IV. RESULT ANALYSIS 30 3 3.5
The below table shows the analysis of Execution time in Sec 35 3.5 4.1
for k=10, the analysis is done on various sampled points from 40 4 4.6
100 to 10000 and the existing and proposed algorithm is Figure 3 Analysis of Relative Speed up
applied over these sampled points and the proposed algorithm
provides efficient and less execution time in comparison with
The below figure shows the analysis of Execution time in Sec
existing Sky-MR+ algorithm.
for k=10, the analysis is done on various sampled points from
100 to 10000 and the existing and proposed algorithm is
Execution Time (Sec) with k=10 applied over these sampled points and the proposed algorithm
# of Sampled Points SKY-MR+ Proposed Work provides efficient and less execution time in comparison with
existing Sky-MR+ algorithm.
100 76 63
200 75 65
400 78 69
1000 77 67
2000 82 72
4000 84 74
10000 92 78
Table 1 Analysis of Execution Time with k=10
Comparison of Execution Comparison of Relative
Time (Sec), k=10 Speed up
100 5
Execution Time (Sec)

80

Relative Speed up
4
60
40 SKY-MR+ 3
SKY-MR+
20 2
0 Proposed
Work 1 Proposed
100
200
400
1000
2000
4000
10000
Work
0
No. of Sampled Points 10 15 20 25 30 35 40
No. of Machines
Figure 1 Comparison of Execution Time on k=10
The below figure shows the analysis of Execution time in Sec Figure 3 Comparison of Relative Speed up
for ANTI Dataset, the analysis is done on various dimensions
from 2 to 12 and the existing and proposed algorithm is V. CONCLUSION
applied over these dimensions and the proposed algorithm The Skyline operator can be implemented directly in SQL
provides efficient and less execution time in comparison with using current SQL constructs, however this has been shown to
existing Sky-MR+ algorithm. be very slow. Other algorithms have been proposed that make
use of divide and conquer, indices, MapReduce and general-
purpose computing on graphics cards Skyline queries on data
Comparison of Execution streams (i.e. continuous skyline queries) have been studied in
the context of parallel query processing on multicourse, owing
Time (Sec) on ANTI to their wide diffusion in real-time decision making problems
and data streaming analytics.
6000 The proposed methodology implemented is similar to the
Execution Time (Sec)

5000 architecture followed in existing technique, but to find the


4000 “Points in Region” Principle Component Analysis technique is
applied.
3000 SKY-MR+
The Proposed Methodology implemented here for the efficient
2000 processing of Skyline Queries on various Synthetic Dataset is
Proposed implemented by Principal Component Analysis and in the
1000
Work basis of performance comparison between existing work and
0 the proposed methodology, it provides efficient processing of
2 4 6 8 10 12 Relative Speed up as well as Efficient decreased in Execution
No. of Dimensions time on the basis of number of sampled points and number of
dimensions.
Figure 2 Comparison of Execution time on ANTI REFERENCES
The below figure shows the analysis of Relative Speed up, the ̈
[1] Borzs̈onyi, S., Kossman, D., and Stocker, K.The skyline
analysis is done on various number of machines from 10 to 40 operator. In Proc. ICDE (2001), pp. 421–430.
and the existing and proposed algorithm is applied over these [2] S. B ̈orzs ̈onyi, D. Kossmann, and K. Stocker. The Skyline
machines and the proposed algorithm provides efficient and Operator. In Proceedings of ICDE’01 , pages 421–430, 2001.
high relative speed up in comparison with existing Sky-MR+ [3] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline
algorithm. with Presorting. In Proceedings of ICDE’03 , pages 717–
816.IEEE Computer Society, 2003.
[4] Z. Huang, C. S. Jensen, H. Lu, and B. C. Ooi. Skyline
Queries Against Mobile Lightweight Devices in MANETs. In
Proceedings of ICDE’06. IEEE Computer Society, 2006.
[5] D. Kossmann, F. Ramsak, and S. Rost. Shooting Stars in [13] M. Sharifzadeh and C. Shahabi. The spatial skyline
the Sky: An Online Algorithm for Skyline Queries. In queries. In VLDB, 2006.
Proceedings of VLDB’02, pages 275–286, 2002. [14] K. Deng, et al. Multi-source skyline query processing in
[6] X. Lin, Y. Yuan, W. Wang, and H. Lu. Stabbing the Sky: road networks. In ICDE, 2007.
Efficient Skyline Computation over Sliding Windows. In [15] L. Chen and X. Lian. Dynamic skyline queries in metric
Proceedings of ICDE’05, pages 502–513. IEEE Computer spaces. In EDBT, 2008.
Society, 2005. [16] D. Fuhry, et al. Efficient skyline computation in metric
[7] D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive space. In TR-KSU-CS-2008-02, Department of Computer
Skyline Computation in Database Systems.ACM Trans. Science Kent State University, 2008.
Database Syst., 30(1):41–82, 2005. [17] Duong Van Hieu, Sucha Smanchat, and Phayung
[8] K.-L. Tan, P.-K. Eng, and B. C. Ooi. Efficient Progressive Meesad,” MapReduce Join Strategies for Key-Value Storage”,
Skyline Computation. In Proceedings of VLDB’01, pages IEEE, 2014.
301–310, 2001. [18] Zhi-qiong Wang*, Shi-kai Jin , Ke Gong,” Energy-
[9] Yoonjae Park, Jun-Ki Min and Kyuseok Shim,” Efficient Efficient Skycube Query Processing in Wireless Sensor
Processing of Skyline Queries Using MapReduce”, IEEE Networks”, TELKOMNIKA, Vol. 11, No. 10, October 2013,
Transactions on knowledge and data engineering, 2016. pp. 6240 ~ 6249 ISSN: 2302-4046.
[10] Jan Chomicki, Parke Godfrey, Jarek Gryz, Dongming [19] I. Bartolini, P. Ciaccia, and M. Patella, “Efficient sort-
Liang,” Skyline with Presorting: Theory and Optimizations”, based skyline evaluation,” ACM Trans. Database Syst., vol.
in ICDE, 2003, pp. 717–719. 33, no. 4, p. 31, 2008.
[11] J. Chomicki, P. Godfrey, J. Gryz, and D. Liang. Skyline [20] Donald Kossmann Frank Ramsak Steffen Rost,” Shooting
with presorting. In Proceedings ICDE, pages 717–719, 2003. Stars in the Sky: An Online Algorithm for Skyline Queries”,
[12] L. Zou, L. Chen, M. T. O¨ zsu, and D. Zhao, “Dynamic Proceedings of the 28th VLDB Conference, Hong Kong,
skyline queries in large graphs,” in DASFAA, 2010. China, 2002.

Вам также может понравиться