eXtremeSQL uses a rule-based query optimizer to support goals of
1) predictable and fast execution of queries, and
2) maximum developer flexibility to tune the queries and indirectly specify the optimal execution plans by arranging the tables and filters according to simple rules.
The eXtremeSQL query optimization is based on a set of rules mostly using indexes, and avoiding sequential table scans whenever possible.
The optimizer can push down predicates and can process any combination of ascending or descending indexes, order by clauses, comparison operators (<=, >, …) in search constrains, etc.
Some operations are handled without using indexes; for example
BETWEEN
expressions are not processed through the index, but instead the engine does a lookup based on one of the boundaries of the range, and then simply checks the validity of the other boundary.Instead of using cost-based optimization, a smarter algorithm is used to choose the right index for a query. There are various criteria; the engine attempts to pick the index that supports the required order and replaces the maximum number of predicates in the query while keeping the index length to a minimum.
For a simple example: assume the query clause “
x=1 AND y=2 AND z=3
”. Here the optimizer looks for an index that covers as many of the conditions as possible. So if there are indexestree<x>, tree<y>, tree<z,x>
, thentree<z,x>
index is chosen. But if there is also an indextree<x,y,a,b,c>
, the optimizer would not pick it because it covers the same number of conditions (two), but includes more key components, hence supposedly its selectivity is less.Using the
explain
operator before aselect
statement displays the execution plan. The abbreviations in uppercase indicate the optimization algorithm (described in the table below); the indentations indicate the nesting level and an integer number in square brackets indicates a “data source id” (this is an internal identifier of a "data source").There are several ways to make the optimizer change the plan and there are some limitations. For example:
- Change the order in which the tables appear in
JOINs
. It pays to put a bigger table or a table with the predicate with higher selectivity up-front. The worst-case scenario for the nested look is when the larger table appears at the end of the query expression.- Change the order of conjuncts (the parts of the query joined with an
AND
). Currently the optimizer, with all other conditions being equal, attempts to use an index for the far “left” key. However, if there is a compound index that covers multiple conjunct’s parts, that index is used. The optimizer also takes into account the existing index types and matches them into the query predicates. If ordering is required, (order by) atree
index must be used, etc.,- It is sometimes beneficial to explicitly prohibit the use of an index for a given key. Specifically this is applicable when the application benefits from using another index or no index at all (sequential scan). To "turn off” an index a “dummy" expression can be added. For example:
XSQL>explain select * from TTT where a=1; XSQL>create table TTT(a int using index); Plan ------------------------------------------------------------- INDEX-SCAN[1] using index TTT.a(a) Selected records: 1 XSQL>explain select * from TTT where a+0=1; XSQL>create table TTT(a int using index); Plan ------------------------------------------------------------- FILTER (Eq (IntAdd TTT.a 0) 1) .SEQ_SCAN[1] of table TTT Selected records: 1(Note that the optimizer does not currently support constant propagation (i.e. it does not substitute constant values with calculated constant expressions). So don’t use expressions like (2+2) in queries.)
- The optimizer is unable to convert (flatten) subqueries into normal joins. That must be done manually, for example rewrite:
select c1 from t1 where c1 in (select c1 from t2);as
select t1.c1 from t1, t2 where t1.c1=t2.c2;
- If a long calculation (a function) is included into the column list (for example a
sequence
aggregate) and the query includes asort
with the limit clause, it is advisable to separate the function into a subquery. The optimizer is currently unable to do it, so it must be done manually.- It is possible to use a
shuffle join
instead of ahash join
. UnlikeHashJoin
,ShuffleJoin
does not materialize the entire inner table in memory. Instead theShuffleJoin
partitions both inner and outer table into N files based on a hash value of the join keys. The number of shuffle files is specified through thenShuffleFiles
field of the SqlOptimizerParameters. The default value is 128. The number of partitions can be also changed for the current session using thesetShuffleFiles(n)
function, which returns the previous value of this parameter. The memory footprint of theShuffleJoin
can be estimated as(sizeof(OuterTable) + sizeof(InnerTable)) / N
. Note that if the number of partitions N is high (such asN > 1000
), the system can run out of file descriptors. Currently the decision of whether to use theshuffle join
instead of thehash join
is up to the application. The keywordshuffle
is added to SQL, which can be used together with the standard join qualifiers (left, outer, natural
). For example:select * from A shuffle join B on A.id = B.id;
Relational Operators and Execution Plans
The table below describes some of the implementation considerations for relational operators and explains the pros and cons of each operator:
Relational operation “explain” plan abbreviation Used by the optimizer when Pros Cons Sequential table scan SEQ-SCAN
No indexes are available for the query Records are accessed in the sequential order (in the order in which they were written to the database) Large per record access overhead. There is no separate tablespace Index scan INDEX-SCAN
The search predicate includes key fields; The result set needs to be sorted by the key Fast Generally speaking random access to storage to access. That can be mitigated by presorting tables in the order of the index through the batch insert Index scan within the range (both left and right boundaries) INDEX-RANGE-SCAN
When the BETWEEN clause is used; A pair of comparison operators are used for the key This is more efficient than the combination of the INDEXSCAN and FILTER, because the scan is interrupted just as soon as possible Doesn't support compound keys Merging results of several index scans INDEX-MERGE
The key is used in the “IN” operator, or in multiple “OR” predicates Fast It is efficient only for indexes with high selectivity Nested loop join NESTED-LOOP-JOIN
Is used when neither the hash nor the index joins are possible Very versatile — handles any join condition including cross joins (all records where each row from the first table is combined with each row from the second table Very slow Index join INDEX-JOIN
An index exists for the inner table join key The fastest join algorithm Bad locality of references (hash is not ordered in any way) Hash join HASH-JOIN
An index join is not applicable and the join condition contains only equality comparisons It is faster than the merge (sort) join Requires a lot of memory as the hash table for the join is built in memory. Therefore could run out of quota (if defined) Filter FILTER
The query contains a WHEN clause and there are no indexes available Versatile Slow Project SELECT
To select a subset of columns (fields) Reduces the size of the results set (selected data set). Sometimes dramatically Requires to populate temporary records instead of using direct references to objects leading to memory and performance overhead Hash aggregate HASH-AGGREGATE
Simple aggregation with the GROUP BY The fastest way of aggregation with a group by Builds hash table in memory — sense the memory overhead and may run out of quota Grand aggregate GRAND-AGGREGATE
Simple aggregation without the GROUP BY The fastest way of aggregation without group by None Sort aggregate SORT-AGGREGATE
Aggregation that utilizes sort operation by the group keys Versatile— handles arbitrary expressions in aggregates Memory overhead: requires to materialize the result set and sort it there. Can run out of quota Sorting ORDER-BY
Used when the query order cannot be satisfied with the index scan Versatile: handles an arbitrary order clause Memory overhead: sorting requires to materialize the result set. Can run out of quota Top-N sort ORDER-BY.LIMIT
Used when the query includes an ORDER-BY clause followed by a LIMIT clause Retains only N records in memory Can be slower than the SORT +LIMIT combination for a large N Intersect two result sets INTERSECT
INTERSECT clause is used None except to provide functionality The input result sets have to be sorted Eliminates duplicates DISTINCT
DISTINCT clause is used None except to provide functionality The input data set has to be sorted Union two result sets UNION
UNION clause is used None except to provide functionality The size of the result is unknown in advance Combine two result sets UNION-ALL
UNION ALL clause is used Fast, no memory overhead The size of the result is unknown in advance Subtract two result sets EXCEPT
EXCEPT clause is used None except to provide functionality The input data sets have to be sorted Sort, then group by columns marked as distinct and retain only the first column for each group DISTINCT-GROUP
This is a special eXtremeDB SQL extension Compact, simple and efficient way of getting first/last records in the group This is to substitute a standard SQL WINDOW clause Transpose sequences to rows (vertical to horizontal representation) FLATTEN
Select with the FLATTENED qualifier Allows applying standard SQL operators to vertical data Significantly increase the result set size Limit the result set: specify the offset and the maximum number of records in the result set LIMIT
The LIMIT clause is present Reduce size of the result set No fast way to skip records Materialize result set in memory MATERIALIZE
Subquery which doesn't depend on the outer query Avoid redundant calculation by buffering the result set of the intermediate query Large memory footprint Following are some definitions that can help understand how the optimizer determines the execution plan for a query:
Indexable Expression
An expression is an indexable expression if:
- it is a column of the table being inspected (while performing a table join) and there is an index defined in the database schema for this field or for a set of fields (compound key) where this field is the first.
- the expression is a reference access expression (
pointer_field.referenced_expression
) wherepointer_field
is a reference field of the table being inspected andreferenced_expression
is an indexable expression and there is an index defined in the database schema forpointer_field
. In this case, eXtremeSQL will first perform an index search to select records matchingreferenced_expression
and then perform an index search in this table using the index forpointer_field
, locating records in whichpointer_field
refers to one of the records in this selection.For example, suppose we have two tables, Supplier and Order, and the query:
select * from Order where supplier.location.city=’New York’;This query can be executed using an index search if:
If these conditions are true, then eXtremeSQL will first perform an index search in the table Supplier, selecting suppliers located in New York, and for each selected record search its
AUTOID
in the index for fieldsupplier
in the table Order.Note that when defining a table with SQL DDL, in order for the optimizer to utilize index-based optimizations, the index fields must be declared
not null
. For example:XSQL>create table t (i int not null, j int not null); XSQL>create index idx on t (i, j);Known Value
An expression is a known value if it is a literal or a query parameter, and it is a field of a table with a smaller table number than the table being inspected.
For example, consider two tables:
Create table Person (pid integer primary key, name string using index); Create table Hobby (pid person key, description string);To select all hobbies of a Person, there are two possible formulations of the SQL:
GOOD example:
Select * from Person p, Hobby h where p.name=’John Smith’ and p.pid = h.pid;BAD example:
Select * from Hobby h, Person p where p.name=’John Smith’ and p.pid = h.pid;
From the information given earlier we know that the conjuncts will be ordered as follows:
GOOD EXAMPLE BAD EXAMPLE p.name = “John Smith”
p.pid = h.pid
p.pid = h.pid
p.name = “John Smith”
In the GOOD example, an index search for
p.name = “John Smith”
will be evaluated first, then an index join to select its Hobby (or hobbies).In the BAD example, a sequential scan of Hobby will be processed and for each record find the related Person using indexed join, and then check that the name of this person is "John Smith".
Distinct
When the
DISTINCT
qualifier is in the query, then after applying all possible optimizations, all selected tuples are sorted by all columns and duplicates are removed. If theORDER BY
orGROUP BY
clause is present in the statement together with theDISTINCT
qualifier, then the fields in theORDER BY
orGROUP BY
are compared first during sorting, so the sort operation is performed only once.eXtremeSQL uses the Quicksort algorithm for sorting records. eXtremeSQL first extracts the sort keys into a separate array (or part of the key in case of strings), then sorts this array, and finally refines the order by performing a comparison of all columns mentioned in the
ORDER BY
list.Order By
When a
tree
index is used to select records, selection is automatically sorted by the key corresponding to this index. If there is an explicitORDER BY
clause in this statement, sorting records by this key, then eXtremeSQL does not perform extra sorting, since the records selected are already in the requested order. The direction (ascending or descending) of theORDER BY
clause should be the same as the direction specified for this field in the index.Subquery
eXtremeSQL optimizes the execution of subqueries by checking the dependencies of the subquery expression. The result returned by the subquery execution is saved and only recalculated if the subquery expression refers to the fields from the enclosing scope.
Show Plan
To view the query execution plan, invoke the
trace(true)
method of the SqlEngine class. Then, during execution of each statement you will see a dump of the query and execution of indexed and sequential searches. Please note that eXtremeSQL doesn't store query execution plans and the decision whether to apply an index or not is taken when the query is executed. In other words, eXtremeSQL has no precompiled queries. Making decisions about applying indexes during query execution time allows taking into account the actual values of searched operands.For example, use an index search in query:
db.executeQuery("select * from T where name like %s", "John%");
and use a sequential search in query:
db.executeQuery("select * from T where name like %s", "%John%");
A precompiled execution plan would have insufficient information to make this judgment.