Relational database management systems often include a
query optimizer which attempts to determine the most efficient way to execute a given query. Query optimizers enumerate possible
query plans, estimate their cost, and pick the plan with the lowest estimated cost. If queries are represented by operators from relational algebra, the query optimizer can enumerate possible query plans by rewriting the initial query using the algebraic properties of these operators.
Queries can be represented as a
tree, where • the internal nodes are operators, • leaves are
relations, • subtrees are subexpressions. The primary goal of the query optimizer is to transform
expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree is smaller than it was before the
optimization. The secondary goal is to try to form common subexpressions within a single query, or if there is more than one query being evaluated at the same time, in all of those queries. The rationale behind the second goal is that it is enough to compute common subexpressions once, and the results can be used in all queries that contain that subexpression. Here are a set of rules that can be used in such transformations.
Selection Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if the selections in an expression tree are moved towards the leaves, the internal
relations (yielded by subexpressions) will likely shrink.
Basic selection properties Selection is
idempotent (multiple applications of the same selection have no additional effect beyond the first one), and
commutative (the order selections are applied in has no effect on the eventual result). • \sigma_{A}(R)=\sigma_{A}\sigma_{A}(R)\,\! • \sigma_{A}\sigma_{B}(R)=\sigma_{B}\sigma_{A}(R)\,\!
Breaking up selections with complex conditions A selection whose condition is a
conjunction of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a
disjunction is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately. • \sigma_{A \land B}(R)=\sigma_{A}(\sigma_{B}(R))=\sigma_{B}(\sigma_{A}(R)) • \sigma_{A \lor B}(R)=\sigma_{A}(R)\cup\sigma_{B}(R)
Selection and cross product Cross product is the costliest operator to evaluate. If the input
relations have
N and
M rows, the result will contain NM rows. Therefore, it is important to decrease the size of both operands before applying the cross product operator. This can be effectively done if the cross product is followed by a selection operator, e.g. \sigma_{A}(R \times P). Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules. In the above case the condition
A is broken up in to conditions
B,
C and
D using the split rules about complex selection conditions, so that A = B \wedge C \wedge D and
B contains attributes only from
R,
C contains attributes only from
P, and
D contains the part of
A that contains attributes from both
R and
P. Note, that
B,
C or
D are possibly empty. Then the following holds: :\sigma_{A}(R \times P) = \sigma_{B \wedge C \wedge D}(R \times P) = \sigma_{D}(\sigma_{B}(R) \times \sigma_{C}(P))
Selection and set operators Selection is
distributive over the set difference, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. For the set difference and the intersection operators, it is possible to apply the selection operator to just one of the operands following the transformation. This can be beneficial where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller
relation as an operand. • \sigma_{A}(R\setminus P)=\sigma_{A}(R)\setminus \sigma_{A}(P) =\sigma_{A}(R)\setminus P • \sigma_{A}(R\cup P)=\sigma_{A}(R)\cup\sigma_{A}(P) • \sigma_{A}(R\cap P)=\sigma_{A}(R)\cap\sigma_{A}(P)=\sigma_{A}(R)\cap P=R\cap \sigma_{A}(P)
Selection and projection Selection commutes with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from omitted fields). \pi_{a_1, \ldots ,a_n}(\sigma_A( R )) = \sigma_A(\pi_{a_1, \ldots,a_n}( R ))\,\,\text{if}\,A \subseteq \{a_1,\ldots,a_n\}
Projection Basic projection properties Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection. \begin{align}&\pi_{a_1, \ldots , a_n}(\pi_{b_1,\ldots , b_m}(R)) = \pi_{a_1, \ldots , a_n}(R)\\[4pt] &\,\,\,\text{where}\,\,\{a_1, \ldots , a_n\} \subseteq \{b_1, \ldots , b_m\}\end{align}
Projection and set operators Projection is
distributive over set union. \pi_{a_1, \ldots, a_n}(R \cup P) = \pi_{a_1, \ldots, a_n}(R) \cup \pi_{a_1, \ldots, a_n}(P). \, Projection does not distribute over intersection and set difference. Counterexamples are given by: \pi_A(\{ \langle A=a, B=b \rangle \} \cap \{ \langle A=a, B=b' \rangle \}) = \emptyset \begin{align}&\pi_A(\{ \langle A=a, B=b \rangle \}) \cap \pi_A(\{ \langle A=a, B=b' \rangle \})\\[4pt] &= \{ \langle A=a \rangle \}\end{align} and \begin{align}&\pi_A(\{ \langle A=a, B=b \rangle \} \setminus \{ \langle A=a, B=b' \rangle \})\\[4pt] &= \{ \langle A=a\rangle \}\end{align} \pi_A(\{ \langle A=a, B=b \rangle \}) \setminus \pi_A(\{ \langle A=a, B=b' \rangle \}) = \emptyset where is assumed to be distinct from .
Rename Basic rename properties Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed. • \rho_{a / b}(\rho_{b / c}(R)) = \rho_{a / c}(R)\,\! • \rho_{a / b}(\rho_{c / d}(R)) = \rho_{c / d}(\rho_{a / b}(R))\,\!
Rename and set operators Rename is distributive over set difference, union, and intersection. • \rho_{a / b}(R \setminus P) = \rho_{a / b}(R) \setminus \rho_{a / b}(P) • \rho_{a / b}(R \cup P) = \rho_{a / b}(R) \cup \rho_{a / b}(P) • \rho_{a / b}(R \cap P) = \rho_{a / b}(R) \cap \rho_{a / b}(P)
Product and union Cartesian product is distributive over union. • (A \times B) \cup (A \times C) = A \times (B \cup C) == Implementations ==