eXtremeSQL uses a rule-based query optimizer to support goals of
1) predictable and fast execution of queries, and
2) maximum developer flexibility to tune the queries and indirectly specify the optimal execution plans by arranging the tables and filters according to simple rules.
The eXtremeSQL query optimization is based on a set of rules mostly using indexes, and avoiding sequential table scans whenever possible.
The optimizer can push down predicates and can process any combination of ascending or descending indexes, order by clauses, comparison operators (<=, >, …) in search constrains, etc.
Some operations are handled without using indexes; for example
BETWEENexpressions are not processed through the index, but instead the engine does a lookup based on one of the boundaries of the range, and then simply checks the validity of the other boundary.Instead of using cost-based optimization, a smarter algorithm is used to choose the right index for a query. There are various criteria; the engine attempts to pick the index that supports the required order and replaces the maximum number of predicates in the query while keeping the index length to a minimum.
For a simple example: assume the query clause “
x=1 AND y=2 AND z=3”. Here the optimizer looks for an index that covers as many of the conditions as possible. So if there are indexestree<x>, tree<y>, tree<z,x>, thentree<z,x>index is chosen. But if there is also an indextree<x,y,a,b,c>, the optimizer would not pick it because it covers the same number of conditions (two), but includes more key components, hence supposedly its selectivity is less.Using the
explainoperator before aselectstatement displays the execution plan. The abbreviations in uppercase indicate the optimization algorithm (described in the table below); the indentations indicate the nesting level and an integer number in square brackets indicates a “data source id” (this is an internal identifier of a "data source").There are several ways to make the optimizer change the plan and there are some limitations. For example:
- Change the order in which the tables appear in
JOINs. It pays to put a bigger table or a table with the predicate with higher selectivity up-front. The worst-case scenario for the nested look is when the larger table appears at the end of the query expression.- Change the order of conjuncts (the parts of the query joined with an
AND). Currently the optimizer, with all other conditions being equal, attempts to use an index for the far “left” key. However, if there is a compound index that covers multiple conjunct’s parts, that index is used. The optimizer also takes into account the existing index types and matches them into the query predicates. If ordering is required, (order by) atreeindex must be used, etc.,- It is sometimes beneficial to explicitly prohibit the use of an index for a given key. Specifically this is applicable when the application benefits from using another index or no index at all (sequential scan). To "turn off” an index a “dummy" expression can be added. For example:
XSQL>explain select * from TTT where a=1; XSQL>create table TTT(a int using index); Plan ------------------------------------------------------------- INDEX-SCAN[1] using index TTT.a(a) Selected records: 1 XSQL>explain select * from TTT where a+0=1; XSQL>create table TTT(a int using index); Plan ------------------------------------------------------------- FILTER (Eq (IntAdd TTT.a 0) 1) .SEQ_SCAN[1] of table TTT Selected records: 1(Note that the optimizer does not currently support constant propagation (i.e. it does not substitute constant values with calculated constant expressions). So don’t use expressions like (2+2) in queries.)
- The optimizer is unable to convert (flatten) subqueries into normal joins. That must be done manually, for example rewrite:
select c1 from t1 where c1 in (select c1 from t2);as
select t1.c1 from t1, t2 where t1.c1=t2.c2;
- If a long calculation (a function) is included into the column list (for example a
sequenceaggregate) and the query includes asortwith the limit clause, it is advisable to separate the function into a subquery. The optimizer is currently unable to do it, so it must be done manually.- It is possible to use a
shuffle joininstead of ahash join. UnlikeHashJoin,ShuffleJoindoes not materialize the entire inner table in memory. Instead theShuffleJoinpartitions both inner and outer table into N files based on a hash value of the join keys. The number of shuffle files is specified through thenShuffleFilesfield of the SqlOptimizerParameters. The default value is 128. The number of partitions can be also changed for the current session using thesetShuffleFiles(n)function, which returns the previous value of this parameter. The memory footprint of theShuffleJoincan be estimated as(sizeof(OuterTable) + sizeof(InnerTable)) / N. Note that if the number of partitions N is high (such asN > 1000), the system can run out of file descriptors. Currently the decision of whether to use theshuffle joininstead of thehash joinis up to the application. The keywordshuffleis added to SQL, which can be used together with the standard join qualifiers (left, outer, natural). For example:select * from A shuffle join B on A.id = B.id;
Relational Operators and Execution Plans
The table below describes some of the implementation considerations for relational operators and explains the pros and cons of each operator:
Relational operation “explain” plan abbreviation Used by the optimizer when Pros Cons Sequential table scan SEQ-SCANNo indexes are available for the query Records are accessed in the sequential order (in the order in which they were written to the database) Large per record access overhead. There is no separate tablespace Index scan INDEX-SCANThe search predicate includes key fields; The result set needs to be sorted by the key Fast Generally speaking random access to storage to access. That can be mitigated by presorting tables in the order of the index through the batch insert Index scan within the range (both left and right boundaries) INDEX-RANGE-SCANWhen the BETWEEN clause is used; A pair of comparison operators are used for the key This is more efficient than the combination of the INDEXSCAN and FILTER, because the scan is interrupted just as soon as possible Doesn't support compound keys Merging results of several index scans INDEX-MERGEThe key is used in the “IN” operator, or in multiple “OR” predicates Fast It is efficient only for indexes with high selectivity Nested loop join NESTED-LOOP-JOINIs used when neither the hash nor the index joins are possible Very versatile — handles any join condition including cross joins (all records where each row from the first table is combined with each row from the second table Very slow Index join INDEX-JOINAn index exists for the inner table join key The fastest join algorithm Bad locality of references (hash is not ordered in any way) Hash join HASH-JOINAn index join is not applicable and the join condition contains only equality comparisons It is faster than the merge (sort) join Requires a lot of memory as the hash table for the join is built in memory. Therefore could run out of quota (if defined) Filter FILTERThe query contains a WHEN clause and there are no indexes available Versatile Slow Project SELECTTo select a subset of columns (fields) Reduces the size of the results set (selected data set). Sometimes dramatically Requires to populate temporary records instead of using direct references to objects leading to memory and performance overhead Hash aggregate HASH-AGGREGATESimple aggregation with the GROUP BY The fastest way of aggregation with a group by Builds hash table in memory — sense the memory overhead and may run out of quota Grand aggregate GRAND-AGGREGATESimple aggregation without the GROUP BY The fastest way of aggregation without group by None Sort aggregate SORT-AGGREGATEAggregation that utilizes sort operation by the group keys Versatile— handles arbitrary expressions in aggregates Memory overhead: requires to materialize the result set and sort it there. Can run out of quota Sorting ORDER-BYUsed when the query order cannot be satisfied with the index scan Versatile: handles an arbitrary order clause Memory overhead: sorting requires to materialize the result set. Can run out of quota Top-N sort ORDER-BY.LIMITUsed when the query includes an ORDER-BY clause followed by a LIMIT clause Retains only N records in memory Can be slower than the SORT +LIMIT combination for a large N Intersect two result sets INTERSECTINTERSECT clause is used None except to provide functionality The input result sets have to be sorted Eliminates duplicates DISTINCTDISTINCT clause is used None except to provide functionality The input data set has to be sorted Union two result sets UNIONUNION clause is used None except to provide functionality The size of the result is unknown in advance Combine two result sets UNION-ALLUNION ALL clause is used Fast, no memory overhead The size of the result is unknown in advance Subtract two result sets EXCEPTEXCEPT clause is used None except to provide functionality The input data sets have to be sorted Sort, then group by columns marked as distinct and retain only the first column for each group DISTINCT-GROUPThis is a special eXtremeDB SQL extension Compact, simple and efficient way of getting first/last records in the group This is to substitute a standard SQL WINDOW clause Transpose sequences to rows (vertical to horizontal representation) FLATTENSelect with the FLATTENED qualifier Allows applying standard SQL operators to vertical data Significantly increase the result set size Limit the result set: specify the offset and the maximum number of records in the result set LIMITThe LIMIT clause is present Reduce size of the result set No fast way to skip records Materialize result set in memory MATERIALIZESubquery which doesn't depend on the outer query Avoid redundant calculation by buffering the result set of the intermediate query Large memory footprint Following are some definitions that can help understand how the optimizer determines the execution plan for a query:
Indexable Expression
An expression is an indexable expression if:
- it is a column of the table being inspected (while performing a table join) and there is an index defined in the database schema for this field or for a set of fields (compound key) where this field is the first.
- the expression is a reference access expression (
pointer_field.referenced_expression) wherepointer_fieldis a reference field of the table being inspected andreferenced_expressionis an indexable expression and there is an index defined in the database schema forpointer_field. In this case, eXtremeSQL will first perform an index search to select records matchingreferenced_expressionand then perform an index search in this table using the index forpointer_field, locating records in whichpointer_fieldrefers to one of the records in this selection.For example, suppose we have two tables, Supplier and Order, and the query:
select * from Order where supplier.location.city=’New York’;This query can be executed using an index search if:
If these conditions are true, then eXtremeSQL will first perform an index search in the table Supplier, selecting suppliers located in New York, and for each selected record search its
AUTOIDin the index for fieldsupplierin the table Order.Note that when defining a table with SQL DDL, in order for the optimizer to utilize index-based optimizations, the index fields must be declared
not null. For example:XSQL>create table t (i int not null, j int not null); XSQL>create index idx on t (i, j);Known Value
An expression is a known value if it is a literal or a query parameter, and it is a field of a table with a smaller table number than the table being inspected.
For example, consider two tables:
Create table Person (pid integer primary key, name string using index); Create table Hobby (pid person key, description string);To select all hobbies of a Person, there are two possible formulations of the SQL:
GOOD example:
Select * from Person p, Hobby h where p.name=’John Smith’ and p.pid = h.pid;BAD example:
Select * from Hobby h, Person p where p.name=’John Smith’ and p.pid = h.pid;
From the information given earlier we know that the conjuncts will be ordered as follows:
GOOD EXAMPLE BAD EXAMPLE p.name = “John Smith”p.pid = h.pidp.pid = h.pidp.name = “John Smith”In the GOOD example, an index search for
p.name = “John Smith”will be evaluated first, then an index join to select its Hobby (or hobbies).In the BAD example, a sequential scan of Hobby will be processed and for each record find the related Person using indexed join, and then check that the name of this person is "John Smith".
Distinct
When the
DISTINCTqualifier is in the query, then after applying all possible optimizations, all selected tuples are sorted by all columns and duplicates are removed. If theORDER BYorGROUP BYclause is present in the statement together with theDISTINCTqualifier, then the fields in theORDER BYorGROUP BYare compared first during sorting, so the sort operation is performed only once.eXtremeSQL uses the Quicksort algorithm for sorting records. eXtremeSQL first extracts the sort keys into a separate array (or part of the key in case of strings), then sorts this array, and finally refines the order by performing a comparison of all columns mentioned in the
ORDER BYlist.Order By
When a
treeindex is used to select records, selection is automatically sorted by the key corresponding to this index. If there is an explicitORDER BYclause in this statement, sorting records by this key, then eXtremeSQL does not perform extra sorting, since the records selected are already in the requested order. The direction (ascending or descending) of theORDER BYclause should be the same as the direction specified for this field in the index.Subquery
eXtremeSQL optimizes the execution of subqueries by checking the dependencies of the subquery expression. The result returned by the subquery execution is saved and only recalculated if the subquery expression refers to the fields from the enclosing scope.
Show Plan
To view the query execution plan, invoke the
trace(true)method of the SqlEngine class. Then, during execution of each statement you will see a dump of the query and execution of indexed and sequential searches. Please note that eXtremeSQL doesn't store query execution plans and the decision whether to apply an index or not is taken when the query is executed. In other words, eXtremeSQL has no precompiled queries. Making decisions about applying indexes during query execution time allows taking into account the actual values of searched operands.For example, use an index search in query:
db.executeQuery("select * from T where name like %s", "John%");
and use a sequential search in query:
db.executeQuery("select * from T where name like %s", "%John%");
A precompiled execution plan would have insufficient information to make this judgment.