Академический Документы
Профессиональный Документы
Культура Документы
Page 1
scalable execution and better support for OLAP tools set. For example, find the average balance for the
and analytic applications. A syntax proposal has been first month of each year. Details of these func-
made to ANSI jointly with IBM[OraIBM99]. tions are given in the Appendix.
Page 2
SELECT region, salesperson, FROM V1, V2
SUM(sales) s_sales WHERE V1.yr = V2.yr
FROM salesTable AND V1.s_sales = V2.m_sales
GROUP region, salesperson
ORDER BY region, SUM(sales) DESC
WHERE RANK(region, s_sales) <= 3; Q4 Give the Top-10 products for states which contrib-
ute more than 25% of regional sales.
The data from the in-line view is passed "in order"
to the user defined PL/SQL function (RANK). The This query requires not only access to 2 different
operation of assigning ranks inherently becomes levels of aggregation, but also filtering based on rank
sequential, whereas we could have easily parallelized values.
by partitioning by regional level and keeping only the CREATE VIEW V4
AS SELECT region, state, SUM(sales)
Top-3 values within each region. s_sales
FROM salesTable
GROUP BY region, state
Q2 Compute the moving sum of sales of each product
CREATE VIEW V5
within the last month AS SELECT region, SUM(sales) s_sales
FROM salesTable
This query would require an expensive non-equi GROUP BY region;
self-join where each row is joined with rows 1 month
SELECT region, state, product,
before it: RANK(region, state, X.s_sales) rank
SELECT s1.product, SUM(s2.sales) as m_avg FROM (
FROM salesTable s1, salesTable s2 SELECT region, state, product,
WHERE s1.product = s2.product AND SUM(sales) s_sales
s2.date <= s1.date AND FROM salesTable, V4, V5
s2.date >= ADD_MONTHS(s1.date, -1) WHERE salesTable.region = V4.region
GROUP BY s1.product, s1.date; AND salesTable.state = V4.state
AND V4.region = V5.region
Clearly, the query execution time would grow AND V4.s_sales > 0.25 * V5.s_sales
GROUP BY region,state, product
linearly with the size of window.
ORDER BY region,state,SUM(sales) DESC)
X
WHERE
Q3 For each year, find products with maximum annual RANK(region, state, X.s_sales) <= 10;
Page 3
CREATE VIEW V3 clause ROWS BETWEEN 1 PRECEDING and 1
AS SELECT product, month(date) mn,
SUM(sales) s_sales
FOLLOWING defines a “physical” window of size
FROM salesTable three rows i.e. a window centered at the current row
GROUP BY product, month(date); and includes one previous and one following row.
SELECT product, X.mn, X.s_sales - Physical windows, however, are not suitable for
Y.s_sales dif_sales sparse data as not every value may be present in the
FROM V3 X LEFT OUTER JOIN V3 Y
ON (X.product = Y.product result set. We therefore allow the definition of logical
AND X.mn = Y.mn + 1); windows. Consider the <aggregate group clause>
RANGE INTERVAL ’1’ MONTH PRECEDING. It
The query assumes dense data. The self-join would
defines a "logical" window including all rows within
be unnecessary if there was a function to look at
the last month and the current month. Similarly, the
some other row in the data set.
clause RANGE INTERVAL ’1’ MONTH
The queries in the above examples require
PRECEDING AND ’1’ MONTH FOLLOWING
partitioning the data into some groups, ordering data
defines a "logical" window including all rows with
within each group, defining a window for each row
the last month, current month and next month. A
(applicable to some functions), and specifying an
"physical" window is determined strictly based on
aggregate/function that is applied on rows in the
the number of rows. However, a "logical" window is
window. Our challenge in the language design is to
obtained by applying the condition in the <aggregate
have a unified framework which encompasses all
group clause> to the <order by> expression. A
these classes of functions. The resulting syntax for
logical window can have only one expression in the
these analytic functions used in queries 1-5 is:
ORDER BY clause.
For example, to determine the end points of a
function(<arguments>) OVER
([<partition by clause>] window defined by the aggregate group clause
[<order by clause> (ORDER BY date DESC RANGE INTERVAL ’1’
[aggregate group clause]])
MONTH PRECEDING), we take the value of date in
<partition by > and <order by> clauses in the the current row and subtract ’1’ MONTH from it. We
function definition determine the partitioning and include all rows whose value falls with the last month
ordering columns of the query. The <partition by> and current month.
clause divides the dataset into groups which are then After a window has been defined for every row of
ordered internally based on the <order by> clause. the input, the function is applied to that set.
Note that the <order by> clause of the function does For more details on the syntax, please refer to
not necessarily guarantee the final ordering of the ANSI SQL submission [OraIBM99]
result set. The final ordering of the result set is
dictated by the <order by> clause of the query block. 4 Computational Model
The <aggregate group clause> selects a subset of
Analytic functions are evaluated after all other
rows from this ordered set by defining the end points
query clauses, except distinct and order-by, have
for each row. The row for which we are defining the
been evaluated. In the following query,
end points is called as the current row. For example,
the aggregate group clause ROWS 1 PRECEDING
defines a “physical” window of size two rows i.e. the
previous row and the current row.3 Similarly, the (3) A physical window is tagged with the
keyword ROWS.
Page 4
SELECT region, product, SUM(sales) s1,
RANK() OVER (PARTITION BY region
ORDER BY SUM(sales) DESC) s2 Table 1: Ranking Functions
FROM salesTable
GROUP BY region, product
ORDER BY region, s2; Region SalesPerson sum_sales rank
We will now illustrate the analytical calculations All aggregate functions (sum, count, max etc.) are
under each category where we demonstrate the extended with the window syntax to perform
flexibility of the aggregate group clause. computations over sliding window of rows within a
partition. The aggregate group clause is used to
5 Families of Analytical Functions precisely define this window. ROWS or RANGE
keyword specifies the type of window and
5.1 Ranking Functions specification BETWEEN x PRECEDING and y
These functions assign ranks to rows based on FOLLOWING determines the size of the window.
some ordering criteria. Functions in this category are For example, query Q2 would be represented by:
rank, dense_rank, row_number, ntile, percent_rank,
and cume_dist. For example, rewriting query Q1 as: SELECT product, date,
SUM(sales) sum_sales,
SELECT * FROM
SUM(SUM(sales)) OVER
(SELECT region, salesperson,
(PARTITION BY product ORDER BY date
SUM(sales) sum_sales,
RANGE INTERVAL ’1’ MONTH PRECED-
RANK() OVER
ING) mavg
(PARTITION BY region
FROM salesTable
ORDER BY sum(sales) DESC) rank
GROUP BY product, date;
FROM salesTable
GROUP BY region, salesperson)
WHERE rank <= 3; Aggregate function Sum(sales) is computed over
the group (product, date). Then the data is partitioned
may result in the following data: on product and within each partition ordered by date.
Next, the aggregate group clause “RANGE
Page 5
INTERVAL ’1’ MONTH PRECEDING” is applied and aggregate group clauses, reporting function
to the ordered set for each row, creating windows semantics are obtained. For example, query Q3 can
holding all rows within the last one month of the be expressed as:
value in the current row. For each row, the average
sales is computed over its corresponding window. SELECT *
FROM (SELECT year(date), product,
While ROWS specified windows in terms of SUM(sales) s_sales,
physical offsets, RANGE specifies a logical window. MAX(SUM(sales)) OVER
With the ROWS option, x PRECEDING would (PARTITION BY product)
m_sales
imply x rows preceding the current row while with FROM salesTable
the RANGE option, it implies rows in the logical GROUP BY year(date), product)
WHERE s_sales = m_sales;
range [current value - x, current value] assuming
ascending order. For example, consider the following The above query avoids the view definitions and
two functions: joins that the traditional approach would have
required. This is achieved by reporting aggregates at
AVG(sales) OVER different levels inside an in-line view and performing
(ORDER BY date
ROWS 1 PRECEDING comparisons in the outer view as shown above. A
AND 1 FOLLOWING) phy_avg, similar approach has been proposed in [CR96]. The
AVG(sales) OVER
(ORDER BY date region product s_sales m_sales
RANGE INTERVAL ’1’ day PRECEDING
AND INTERVAL ’1’ day FOLLOWING)
log_avg East P1 1500 2000
Page 6
5.4 Lag/Lead Functions 3 functions by sorting the base data on:
Lag/Lead functions provide an access to any row at • (region, state) which satisfies
a certain offset from the current row in the ordered requirements for functions (1) and (2).
dataset. These functions can be used to compare
• (region, sales DESC) which satisfies
measures across different rows without requiring
function (3).
expensive self-joins. For example, SQL for query Q5
would be: In all, we require at least 2 sorts.
SELECT product, month(date), The analytic functions are evaluated after the
SUM(sales) sum_sales, computation of "group by" clause but prior to the
SUM(sales) - LAG(SUM(sales), 1) OVER
(PARTITION BY product "order by" clause in the main query block. The
ORDER BY month(date)) diff sorting is also required for "group by"5 and "order
FROM salesTable
GROUP BY product, month(date); by" clauses. The ordering done for "group by" could
be used by analytic functions and ordering done for
Lag (Lead) allow access to rows at an offset before analytic functions could be used by "order by" clause
(after) the current row in an ordered sequence. Note of the query. e.g
that Lag/Lead functions are most suitable for dense
data where physical offsets correspond to logical SELECT SUM(sales) s1,
units. SUM(SUM(sales)) OVER
(PARTITION BY region
ORDER BY state) s2,
6 Optimized Execution of Analytic RANK() OVER (ORDER BY SUM(sales)) r1
FROM sales_table
Functions GROUP BY region, state
ORDER BY SUM(sales);
In this section, we present optimizations4
employed in the implementation of the analytic This query can be computed using 2 sorts:
functions in Oracle 8i Release 2. • (region, state) to compute the group
by aggregate s1 and analytic function s2
6.1 Minimization of number of sorts
• (SUM(sales)) to compute analytic
A given SQL query can contain multiple analytic
function r1 and query "order by" clause.
functions, each requiring its own ordering
specification, e.g: The number of sorts could significantly affect the
SELECT region, state, execution time of a query. The problem we are trying
SUM(sales) OVER (PARTITION BY region)
sum_region, to solve is to find a minimal number of sorts which
SUM(sales) OVER (PARTITION BY region, can satisfy a SQL query block.
state) sum_region_state,
RANK() OVER (PARTITION BY region ORDER
BY sales DESC) rank An ordering Group (OG) is a set of analytic
FROM salesTable;
functions which can be satisfied by a single sort. A
Here, each of the 3 functions requires different minimal set of OGs covering all functions is
sortings of the base data. But, we can evaluate all the important for efficient execution of the query.
For computation of this set, we need to consider
Page 7
only the expressions in the PARTITION BY and Here, a, b, x, y, z are different expressions
ORDER BY clauses of the analytic functions. which in various combinations dictate the ordering
Computation of an analytic function requires requirements of a given analytic function. For
ordering (sorting) the data on <p1>, <p2>,..., simplicity, we do not discuss the ASC/DESC options
<pn>,<o1>, <o2>,...<om>, where in the algorithm, but these can be incorporated easily.
With each OG, we also store "commute_index"
• the PARTITION BY expressions <p1>, <p2>, which is the set of (prefix) columns common to all
...<pn> can be commuted in any order, the functions in the OG. This can be useful in parallel
execution to decide the partitioning of data. For
• the ORDER BY expressions <o1>, <o2>, ..,
example, an OG (x, y, z) for functions 1 and 2 above
<om> should follow the PARTITION BY
would have commute_index = 2 implying that the
expressions and appear in the specified order.
data can be partitioned on columns (x, y) and each
We would represent the query GROUP BY as a partition would require ordering on (x, y, z).
dummy analytic function with the PARTITION BY
expressions containing the query GROUP BY The algorithm works as follows:
expressions. This can be done as GROUP by can be
6.1.1 Stage 1
computed by ordering on any commutation of
GROUP BY expressions and the same property is A function with the east number of partition by
satisfied by an analytic function with only expressions is most restrictive in terms of ordering
PARTITION by columns. requirements. For example, function (2) requires
Similarly, the query ORDER BY is represented as ordering starting with (x, y) in any permutation
another dummy analytic function having the same and can possibly be used to satisfy ordering
ORDER BY expressions. This is only for the requirement for function (1) if the third expression is
purposes of computing the minimal orderings. (z). In case of 2 functions with same number of
The algorithm involves 2 stages : P() expressions, a function with greater number of
O() expressions is more restrictive. So, we select
1. Find a set of OGs which cover all
the functions.
such functions and construct OGs around them. This
is used to prune the set of OGs.
2. Select a minimal set of OGs from
the set found in (1) such that all So, we sort the functions in ascending order of
the functions are covered. number of P() expressions as primary key and
descending order of number of ordering expressions
We illustrate the algorithm by an example where as secondary key.
we find a minimal set of OGs for a set of analytic
1. O(a, b)
functions. Consider 5 functions with the following
2. P(a)
partition by P() and order by O() clauses:
3. P(y, x)
1. P(x, y, z)
4. P(x, y, z)
2. P(y, x)
5. P(x, y, a, b) O(z)
3. P(a)
4. O(a, b) From this set, we select a set of functions such that
5. P(x, y, a, b) O(z) OGs containing those functions would cover all
functions. This is done by traversing the above list in
Page 8
order and picking any function which would not • (y, x, z) satisfies functions {1,2},
belong to any OG considered so far.
• (y, x, a, b, z) satisfies functions {2,5},
Page 9
only top 3 salespersons per region. This not only sum(sales) desc) and rank and rep_sum_sales would
reduces the chances of sort spilling to disk, but also be computed together. In this scenario, rank
minimizes the number of sort runs and hence merge predicates can not be pushed into sort as it filter out
passes. This results in an asymptotic complexity of rows necessary to compute rep_sum_sales. The rules
O(n*log N), where n is the number of rows in the deciding when it is legal to push rank predicates are:
dataset and N is the “top-N” value. Functions that are Consider a rank function R = rank() over
pushed into sort currently are RANK, (PARTITION BY x ORDER BY y) and suppose
DENSE_RANK and ROW_NUMBER. there is a predicate Cr on that function in the outer
The rules governing as to when rank functions query block. Without loss of generality we assume
with predicates should be evaluated and when it is that x and y are single columns (extension to the
legal to push rank predicates into the sort are non- multi column keys is obvious). We assume that the
trivial. This is because, in the presence of other predicate is of the form “R {<, <=, =} <constant>” as
analytic functions, this predicate pushdown should then sort filters out records early.
not filter rows needed later. The following example Suppose further that the query block has other
shows why the execution order is important: analytic functions F1, ..., Fn which are in the same
SELECT * ordering group OG as R. Recall that R, F1, ..., Fn can
FROM (SELECT region, salesperson,
SUM(sales) sum_sales,
be evaluated with a single sort. Let Pr, Or represent
RANK() OVER (PARTITION BY region PARTITION BY and ORDER BY keys of R
ORDER BY sum(sales) DESC) rank, respectively and let Pr||Or denote concatenation of Pr
SUM(SUM(sales))OVER
(PARTITION BY region) rep_sum_sales and Or.
FROM salesTable The rules below are sufficient conditions to push
GROUP BY region, salesperson)
WHERE rank <= 3; predicate Cr into the sort for the ordering group OG.
They implement the intuition that filtering on a rank
The execution sequence where in step 2 we filter predicate must not remove rows that are required to
out ranks: compute other functions in the same ordering group.
1. order data on (region, sum(sales)
desc)
1. All reporting functions in the OG
2. compute ranks and filter out must have same or finer granularity
salespersons other than top 3 PARTITION BY clause than Pr||Or.
3. compute sum(sum(sales)) over Thus Cr will not filter out rows
(PARTITION BY region) inside partitions of these
reporting functions. Instead it
may filter out the entire
is incorrect as the reporting aggregate function partitions which is correct. For
“sum” needs to access all rows per region to compute example, predicate Cr can be pushed
rep_sum_sales. Observe that computing with functions like “sum(m) over
(PARTITION BY x, y)” but it can not
rep_sum_sales first and then rank with filtering be pushed with “sum(m) over
would result in correct semantics. We follow a simple (PARTITION BY x)”.
rule that schedules the evaluation rank functions with 2. All ranking functions in the OG
top N predicates in the end. Also, the ordering group must either have finer granularity
optimization described in section 3.1.1 puts rank and PARTITION BY clause than Pr, or
have same PARTITION BY clause but
other analytic functions in the same ordering group - same or finer ORDER BY clause than
which means data will be ordered on (region, Or. The justification is as above.
Page 10
For example, predicate Cr can be on a same or coarser granularity than other functions
pushed with “dense_rank() over in the group, then Cr will be filtering groups which
(PARTITION BY x, y ORDER BY z)” or “
rank() over (PARTITION BY x ORDER include entire groups of other functions. For
BY y, z)”, but can not be pushed example: a predicate on ‘sum(m) over (PARTITION
with “rank() over (ORDER BY x, y)”. BY X)’ can be pushed into the sort if other analytic
3. All window functions in the OG with functions are ‘sum(m) over (PARTITION BY x, y)’,
the ROWS option must have same or or ‘rank() over (PARTITION BY x, ORDER BY y)’,
finer granularity PARTITION BY
clause than Pr and their windows etc.
should not extend to rows past the
6.3 Parallel Execution
current row. Observe that Cr is of
the form R {<, <=, =} <constant>, The basic building block of Oracle’s SQL
thus it doesn’t filter rows above execution plan and engine is an iterator. A iterator is
some <constant> rank, thus window
functions can extend before the an object-oriented mechanism for manipulating rows
current row. If they extended past of data. Associated with each iterator is a start, fetch,
it, the predicated could have and close method. Examples of iterator include table
filter out the needed rows. For
scans, joins, count, union. Iterator nodes are
example, predicate Cr can be pushed
in case of functions like “sum(m) combined together in a tree that implement the query
over (PARTITION BY x ORDER BY y logic and the decisions (e.g. join order, join method,
ROWS 1 PRECEDING”, or “sum(m) over
access method) made by the optimizer. At execution
(PARTITION BY x, y ORDER BY z ROWS
BETWEEN 10 PRECEDING AND CURRENT time, the serial plan is evaluated in a demand-driven
ROW)”. It can not be pushed for fashion, by fetching all rows from the root node of
functions like “sum(m) over the iterator tree.
(PARTITION BY x ORDER BY y ROWS 10
FOLLOWING)”. For more on the iterator model of SQL execution
please refer to [GG93].
4. All window functions in the OG with
RANGE option must have same or
finer granularity PARTITION BY For the parallel plan, consider an execution model
clause than Gr and their windows with iterators as nodes, edges representing flow and
must not extend to rows past the
current row. Justification is as repartitioning of rows up a tree and horizontal data
above. That is, pushing can happen partitioning to distribute load amongst parallel
with functions like “sum(m) over processes at each level. Each edge incurs a
(PARTITION BY x ORDER BY y RANGE 1
communication cost depending on the parallel
PRECEDING)” but it can not happen
with functions like “sum(m) over hardware architecture on which the tree is executed.
(PARTITION BY x ORDER BY y RANGE 1 Consider an iterator tree for the example query Q1
FOLLOWING)”.
in Section 3:
Page 11
communicating them. The factor ’k’ is tunable based
on how reduction in communication cost offsets the
rank iterator additional computation of RANK in the GROUP-BY
rank filter
node. Of course, the final RANK computation is
done in the top RANK node as before. The resultant
execution plan looks as follows.
group-by
rank iterator
table scan
and filter
Page 12
Table 2: Query execution times
Approach 1: Approach 2:
Query id With Analytic Without Analytic Simulation method for the queries
function (in sec) functions (in sec)
Page 13
above mentioned approaches. This could lead to functionality (TopN), but none of the window
increased fan-out of non-equi joins (as in Q2). Query functions and partitioning capabilities presented in
Q2 was run with varying window sizes (we used the paper. For any ROLAP engine which is served by
physical windows here so that we know the exact a back-end RDBMS, the SQL needed to support such
size of window while trying to compare the queries would be very complex and quite inefficient.
performance). The window sizes were small enough Sybase[Syb120] allows selection on non-GROUP
for the whole window to fit in memory. In some other BY columns and hence, some of the reporting
tests, we observed that even when the window aggregates queries mentioned in this paper. The SQL
doesn’t fit into memory the execution times extensions in [CR96] using grouping variables allow
asymptotically approach the same value (within 15% succinct representation of queries involving repeated
of running times for those with small window sizes). aggregation over the same groups and can be
As can be seen in the graph, increase in window size executed efficiently. The extensions proposed in this
leads to degradation in execution times without using paper are similar to those of [CR96] with respect to
window functions. allowing multiple passes over the data. The proposal
We also studied the performance degradation by goes a step farther by allowing aggregates on ordered
increasing the number of analytic functions which sets of data.
could lead to increase in number of self-joins (as in
Q5). Query Q5 was run by increasing the number of 9 Conclusion
LAG/LEAD functions. Increased number of LAG/
We have presented a unified framework for
LEAD functions correspond to greater number of
analytic functions in SQL which operate on
self-joins in the second approach. View V3 used in
partitioned and ordered data. These include five types
this query was materialized on disk for purposes of
of functions - ranking, windowing, reporting
self-join. We can observe much higher negative
functions, lag/lead functions and inverse distribution
impact on query performance due to self-joins from
functions. We have also discussed algorithms for
the graph.
minimizing the number of sorts required by the
Analytic Functions and shown optimizations like
8 Related Work pushing rank predicates into the sort. Finally, we
have provided experimental evidence showing that
SQL extensions with limited analytical our optimized implementation provides substantial
functionality exists in other commercial RDBMSs. performance gains over existing implementation of
Redbrick RISQL [Red60] provides functions for same functionality in SQL. These efforts are an
computing moving aggregates, ranking and initial step towards supporting a rich and efficient set
ratiotoreport. However, moving aggregates can be of OLAP queries in the Oracle RDBMS server.
computed only on physical windows (determined by
number of rows included in the set) which is not 10 References
suitable for sparse data. Further, the data set can only
[1] [Red60] Informix Redbrick Decision Server 6.0
be partitioned once using the RESET BY clause
[2] [TerR30] NCR Teradata V2R3.0.3
since it is tied to the ORDER BY clause of the query.
[3] [MS70] Microsoft SQL Server 7.0
Teradata [TerR30] also provides support of physical
[4] [Syb120] Sybase Adaptive Server Enterprise 12.0
window functions and ranking functions. Microsoft
[5] [CR96] Damianos Chatziantoniou, Kenneth Ross.
SQL Server [MS70] does support limited ranking Querying Multiple Features of Groups in Relational
Page 14
Databases. In Proceedings of the 22nd VLDB distribution model and returns an element from the
conference Mumbai (India), 1996. ordered set which has the smallest discrete percentile
[6] [OraIBM99] Fred Zemke, Krishna Kulkarni, Andy value that is larger than or equal to the percentile
Witkowski, Bob Lyle. Proposal for OLAP functions
for ANSI-NCTS value given as an argument of the function. For
[7] [GG93] Goetz Graefe: Query Evaluation Techniques example, query Q7 can be easily expressed using
for Large Databases. Computing Surveys 25(2): 73- PERCENTILE_DISC as:
170 (1993). SELECT
PERCENTILE_DISC(0.5) WITHIN GROUP
11 APPENDIX (More Analytic Functions) (ORDER BY sales) median_sales,
FROM salesTable
GROUP BY region, product;
11.1 Motivating Examples
Here are other queries which are difficult and PERCENTILE_CONT function assumes a
inefficient to express in SQL. continuous distribution model and computes the
result by doing linear interpolation of the two
Q6 For each product, find total sales on the first day a elements whose continuous percentile values enclose
sale is recorded the percentile value given as the argument of the
This can be done by first computing the first day of function. Since the returned value is a computed
sales for each product and then, joining it back to the value, only numeric or date values are allowed in the
salesTable on (product, date). ORDER BY clause.
Page 15