Академический Документы
Профессиональный Документы
Культура Документы
PCA
College Name
Dept. of Computer Science & Engineering Dept. of Computer Science & Engineering
RGPV University, Bhopal RGPV University, Bhopal
email email
Abstract— The Skyline operators are to let the users know to When exploring unfamiliar data, the skyline operator [1] can
expand the database system using Skyline Operations. Using identify balances among multiple (possibly conflicting)
Skyline Operators all the interesting points from a set of data attributes.
points can be filtered out. The existing SKY-MR+ algorithm is In our example the query retrieves the set of interesting
a very effective framework implemented for Skyline restaurant and it belongs to a novel type of queries. This paper
Operators and queries which uses quadtree-based histogram, gives the theory of spatial skyline queries (SSQ) for the first
but the algorithm provides inconsistent execution time time. For data points P and query points Q in the d
especially for High Datasets, along with that the increase of dimensional space SSQ gives those points of P which are not
number of machines in the system there is decreases in the present in any other point P if set of special derived attributes
relative speed up. Hence an effective framework is is considered. For each data points these attributes
maintains a distance to Q. An
interesting variation is
implemented which reduces the execution time for High
datasets and with the number of machines increases relative
speed increases. The proposed methodology implemented is also studied in this paper; the study gives the
very much similar to the architecture followed in present location where the domination is determined with
technique, but to find the “Points in Region” Principle respect to both spatial and non-spatial attributes of P.
Component Analysis technique is applied. Spatial skyline queries is very critical for many
The methodology proposed in this paper is for the efficient
applications along with online map service and group
processing of Skyline Queries on various Synthetic Dataset.
The implementation is done by Principal Component Analysis navigation/planning. In the trip planning attribute hotels
and in the basis of performance comparison between existing for a fixed location considering conference venue, beach,
work and the proposed methodology; it provides efficient and museums. The SSQ gives all the interesting hotels
processing of Relative Speed along with that there is for lodging during a pleasure trip/business trip and the
considerable decrease in Execution time on the basis of hotels obtained gives the closest location comparing all
number of sampled points and number of dimensions.
other results.
I. INTRODUCTION . In crisis management domain, the residential buildings that
must be evacuated first in the event of several explosions/fires
Skyline queries had been used various multi-criteria decision
are those which are in the spatial skyline with respect to the
support applications for pas twenty years. For the given
fire locations. The reason is that these places are either
dominance relationship in a dataset, skyline queries returns
potentially trapped in the convex hull of fires or located at the
those objects that cannot be dominated by any other objects.
edges of the expanding fire. In defense and intelligence
Skyline queries are being studied in multidimensional spaces,
applications, consider the locations of soldiers penetrating into
in subspaces, in metric spaces, in dynamic spaces, in steaming
enemy’s camps as query locations and the enemy’s guard
environment and in time series data extensively. Various
stations as data points. The stations in the spatial skyline are
algorithms for skyline query processing were proposed for
those from which an attack might be initiated against the
example window based, progressive, distributed, geometric-
platoon of soldiers. Since the introduction of the skyline
based, index based, divide and conquer and dynamic
operator by B ̈orzs ̈onyi et al. [2], several efficient algorithms
programming algorithms. Along with that several variations
have been proposed for the general skyline query. These
are being proposed to solve application specified problems
algorithms utilize techniques such as divide-and-conquer [2],
such as k- dominant skylines, top-k dominant queries, spacial
nearest neighbor search [5], sorting [3], and index structures
skyline queries and others. As the number of objects that are
[2, 8, 7] to answer the general skyline queries. Several studies
returned in a skyline query is increasing to larger amount,
have also focused on the skyline query processing in a variety
there are also studies going on to check the cardinality of
of problem settings such as data streams [6] and data residing
skyline queries. These researches indicate the importance of
on mobile devices [4].
skyline queries and their variations in modern applications.
where a denotes the column vector with entries α1, α2…..m
αM, To find the solutions of previous equation , one solves
II. PROPOSED METHODOLOGY Mα = Kα
In the Existing technique proposed the efficient parallel
algorithm SKY-MR+ for processing skyline queries using PROPOSED ALGORITHM
MapReduce. Here they first build a quadtree-based histogram 1. Input (D, ϭ, m, ∂)
for space partitioning by deciding whether to split each leaf 2. Given a ‘D’ dataset of ‘d’ dimensional dataset of sample
node judiciously based on the benefit of splitting in terms of size ‘ϭ’.
the estimated execution time. In addition, apply the dominance 3. ‘m’ denotes as number of machines, ‘∂’ a size of threshold
power filtering method to effectively prune non-skyline points value.
in advance. Then next partition data based on the regions 4. The algorithm initiates with the sampling of dataset based 𝑆𝐿
divided by the quadtree and compute candidate skyline points = ∀(𝐿𝑜
on the size of samples in dataset.
for each partition using MapReduce. Finally, check whether 𝑆 = 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔( ϭ, 𝐷)
each skyline candidate point is actually a skyline point in 5. The sampled dataset is then passed as an input to the Sky-
every partition using MapReduce. They also develop the QTree+ based on the number of machines.
workload balancing methods to make the estimated execution 𝑄 = 𝑆𝐾𝑌 − 𝑄𝑇𝑟𝑒𝑒 + (𝑆, 𝑚)
times of all available machines to be similar. They did 6. The SKY-Tree+ algorithm is then load balanced using
experiments to compare SKY-MR+ with the state-of-the-art local load balancing and number of machines and Sampled
algorithms using MapReduce and confirmed the effectiveness data including to apply skewness algorithm to minimize
as well as the scalability of SKY-MR+. the chances of overloading.
The proposed methodology implemented is similar to the 𝐴𝐿 = 𝐿𝑜𝑐𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝑆, 𝑚)
architecture followed in existing technique, but to find the 7. Broadcast Q and 𝐴𝐿
“Points in Region” Principle Component Analysis technique is 8. Applying Principal Component Analysis on the
applied. input Dataset and the broadcast Q and 𝐴𝐿
PRINCIPLE COMPONENT ANALYSIS (𝐿𝑜𝑐𝑎𝑙𝑆𝐿, 𝑉𝑚𝑎𝑥, 𝐹𝐼𝐿𝑇𝐸𝑅, 𝐶𝑂𝑈𝑁𝑇)
The feature space is calculated in the following way. Given a = 𝑅𝑢𝑛𝑃𝐶𝐴(𝑏𝑜𝑟𝑎𝑑𝑐𝑎𝑠𝑡(𝑄, 𝐴𝐿 ), 𝐷)
set of centered observations (∑𝑀 𝑖=1 Xi = 0, Xk ), where 9. If LocalSL.totalSize < threshold (∂)
k=1...M, the traditional way of formulating the covariance 10. 𝑆𝐿 = ∀(𝐿𝑜𝑐𝑎𝑙𝑆𝐿, 𝑉𝑀𝑎𝑥, 𝐹𝐼𝐿𝑇𝐸𝑅)
matrix using PCA is 11. Else
12. 𝐴𝐺 = 𝐺𝑙𝑜𝑏𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝐶𝑂𝑈𝑁𝑇, 𝑚)
M
13. Broadcast Q, VMax, FILTER and Ag
𝐶 = 1/𝑀 ∑ 𝑋𝑗 Xjt 14. 𝑆𝐿 = 𝑅𝑢𝑛𝑃𝐶𝐴(∀+, 𝐿𝑜𝑐𝑎𝑙𝑆𝑙)
𝑗=1 15. 𝑟𝑒𝑡𝑢𝑟𝑛 𝑆𝐿
Now the nonlinear feature space F must be defined. F is D: Dataset of d dimension
related to the input space by a possibly nonlinear map ϭ: sample size of dataset
: RN F m: number of machines
The covariance matrix in F can now be defined as ∂: threshold value
M
If
LocalSL.total
Size <
threshold (∂)
The below table shows the analysis of Execution time in Sec
No for ANTI Dataset, the analysis is done on various dimensions
from 2 to 12 and the existing and proposed algorithm is
applied over these dimensions and the proposed algorithm
provides efficient and less execution time in comparison with
existing Sky-MR+ algorithm.
Yes
Execution Time (Sec) on ANTI
# of Dimensions SKY-MR+ Proposed Work
𝐴𝐺 = 𝐺𝑙𝑜𝑏𝑎𝑙𝐵𝑎𝑙𝑎𝑛𝑐𝑒(𝑄, 𝐶𝑂𝑈𝑁𝑇, 𝑚) 2 100 80
4 250 160
80
Relative Speed up
4
60
40 SKY-MR+ 3
SKY-MR+
20 2
0 Proposed
Work 1 Proposed
100
200
400
1000
2000
4000
10000
Work
0
No. of Sampled Points 10 15 20 25 30 35 40
No. of Machines
Figure 1 Comparison of Execution Time on k=10
The below figure shows the analysis of Execution time in Sec Figure 3 Comparison of Relative Speed up
for ANTI Dataset, the analysis is done on various dimensions
from 2 to 12 and the existing and proposed algorithm is V. CONCLUSION
applied over these dimensions and the proposed algorithm The Skyline operator can be implemented directly in SQL
provides efficient and less execution time in comparison with using current SQL constructs, however this has been shown to
existing Sky-MR+ algorithm. be very slow. Other algorithms have been proposed that make
use of divide and conquer, indices, MapReduce and general-
purpose computing on graphics cards Skyline queries on data
Comparison of Execution streams (i.e. continuous skyline queries) have been studied in
the context of parallel query processing on multicourse, owing
Time (Sec) on ANTI to their wide diffusion in real-time decision making problems
and data streaming analytics.
6000 The proposed methodology implemented is similar to the
Execution Time (Sec)