ToC path:

eXtremeSQL Optimizer

eXtremeSQL uses a rule-based query optimizer to support goals of

1) predictable and fast execution of queries, and

2) maximum developer flexibility to tune the queries and indirectly specify the optimal execution plans by arranging the tables and filters according to simple rules.

The eXtremeSQL query optimization is based on a set of rules mostly using indexes, and avoiding sequential table scans whenever possible.

The optimizer can push down predicates and can process any combination of ascending or descending indexes, order by clauses, comparison operators (<=, >, …) in search constrains, etc.

Some operations are handled without using indexes; for example BETWEEN expressions are not processed through the index, but instead the engine does a lookup based on one of the boundaries of the range, and then simply checks the validity of the other boundary.

Instead of using cost-based optimization, a smarter algorithm is used to choose the right index for a query. There are various criteria; the engine attempts to pick the index that supports the required order and replaces the maximum number of predicates in the query while keeping the index length to a minimum.

For a simple example: assume the query clause “x=1 AND y=2 AND z=3”. Here the optimizer looks for an index that covers as many of the conditions as possible. So if there are indexes tree<x>, tree<y>, tree<z,x>, then tree<z,x> index is chosen. But if there is also an index tree<x,y,a,b,c>, the optimizer would not pick it because it covers the same number of conditions (two), but includes more key components, hence supposedly its selectivity is less.

Using the explain operator before a select statement displays the execution plan. The abbreviations in uppercase indicate the optimization algorithm (described in the table below); the indentations indicate the nesting level and an integer number in square brackets indicates a “data source id” (this is an internal identifier of a "data source").

There are several ways to make the optimizer change the plan and there are some limitations. For example:

Change the order in which the tables appear in JOINs. It pays to put a bigger table or a table with the predicate with higher selectivity up-front. The worst-case scenario for the nested look is when the larger table appears at the end of the query expression.

Change the order of conjuncts (the parts of the query joined with an AND). Currently the optimizer, with all other conditions being equal, attempts to use an index for the far “left” key. However, if there is a compound index that covers multiple conjunct’s parts, that index is used. The optimizer also takes into account the existing index types and matches them into the query predicates. If ordering is required, (order by) a tree index must be used, etc.,

It is sometimes beneficial to explicitly prohibit the use of an index for a given key. Specifically this is applicable when the application benefits from using another index or no index at all (sequential scan). To "turn off” an index a “dummy" expression can be added. For example:
     
    XSQL>explain select * from TTT where a=1;
    XSQL>create table TTT(a int using index);
    Plan
    -------------------------------------------------------------
    INDEX-SCAN[1] using index TTT.a(a)
     
    Selected records: 1
    XSQL>explain select * from TTT where a+0=1;
    XSQL>create table TTT(a int using index);
    Plan
    -------------------------------------------------------------
    FILTER (Eq (IntAdd TTT.a 0) 1)
    .SEQ_SCAN[1] of table TTT
     
    Selected records: 1
     
(Note that the optimizer does not currently support constant propagation (i.e. it does not substitute constant values with calculated constant expressions). So don’t use expressions like (2+2) in queries.)

The optimizer is unable to convert (flatten) subqueries into normal joins. That must be done manually, for example rewrite:
     
    select c1 from t1 where c1 in (select c1 from t2);
     
as
     
    select t1.c1 from t1, t2 where t1.c1=t2.c2;
     
If a long calculation (a function) is included into the column list (for example a sequence aggregate) and the query includes a sort with the limit clause, it is advisable to separate the function into a subquery. The optimizer is currently unable to do it, so it must be done manually.

It is possible to use a shuffle join instead of a hash join. Unlike HashJoin, ShuffleJoin does not materialize the entire inner table in memory. Instead the ShuffleJoin partitions both inner and outer table into N files based on a hash value of the join keys. The number of shuffle files is specified through the nShuffleFiles field of the SqlOptimizerParameters. The default value is 128. The number of partitions can be also changed for the current session using the setShuffleFiles(n) function, which returns the previous value of this parameter. The memory footprint of the ShuffleJoin can be estimated as (sizeof(OuterTable) + sizeof(InnerTable)) / N. Note that if the number of partitions N is high (such as N > 1000), the system can run out of file descriptors. Currently the decision of whether to use the shuffle join instead of the hash join is up to the application. The keyword shuffle is added to SQL, which can be used together with the standard join qualifiers (left, outer, natural). For example:
     
    select * from A shuffle join B on A.id = B.id;
     
Relational Operators and Execution Plans

The table below describes some of the implementation considerations for relational operators and explains the pros and cons of each operator:

Relational operation “explain” plan abbreviation Used by the optimizer when Pros Cons

Sequential table scan SEQ-SCAN No indexes are available for the query Records are accessed in the sequential order (in the order in which they were written to the database) Large per record access overhead. There is no separate tablespace

Index scan INDEX-SCAN The search predicate includes key fields; The result set needs to be sorted by the key Fast Generally speaking random access to storage to access. That can be mitigated by presorting tables in the order of the index through the batch insert

Index scan within the range (both left and right boundaries) INDEX-RANGE-SCAN When the BETWEEN clause is used; A pair of comparison operators are used for the key This is more efficient than the combination of the INDEXSCAN and FILTER, because the scan is interrupted just as soon as possible Doesn't support compound keys

Merging results of several index scans INDEX-MERGE The key is used in the “IN” operator, or in multiple “OR” predicates Fast It is efficient only for indexes with high selectivity

Nested loop join NESTED-LOOP-JOIN Is used when neither the hash nor the index joins are possible Very versatile — handles any join condition including cross joins (all records where each row from the first table is combined with each row from the second table Very slow

Index join INDEX-JOIN An index exists for the inner table join key The fastest join algorithm Bad locality of references (hash is not ordered in any way)

Hash join HASH-JOIN An index join is not applicable and the join condition contains only equality comparisons It is faster than the merge (sort) join Requires a lot of memory as the hash table for the join is built in memory. Therefore could run out of quota (if defined)

Filter FILTER The query contains a WHEN clause and there are no indexes available Versatile Slow

Project SELECT To select a subset of columns (fields) Reduces the size of the results set (selected data set). Sometimes dramatically Requires to populate temporary records instead of using direct references to objects leading to memory and performance overhead

Hash aggregate HASH-AGGREGATE Simple aggregation with the GROUP BY The fastest way of aggregation with a group by Builds hash table in memory — sense the memory overhead and may run out of quota

Grand aggregate GRAND-AGGREGATE Simple aggregation without the GROUP BY The fastest way of aggregation without group by None

Sort aggregate SORT-AGGREGATE Aggregation that utilizes sort operation by the group keys Versatile— handles arbitrary expressions in aggregates Memory overhead: requires to materialize the result set and sort it there. Can run out of quota

Sorting ORDER-BY Used when the query order cannot be satisfied with the index scan Versatile: handles an arbitrary order clause Memory overhead: sorting requires to materialize the result set. Can run out of quota

Top-N sort ORDER-BY.LIMIT Used when the query includes an ORDER-BY clause followed by a LIMIT clause Retains only N records in memory Can be slower than the SORT +LIMIT combination for a large N

Intersect two result sets INTERSECT INTERSECT clause is used None except to provide functionality The input result sets have to be sorted

Eliminates duplicates DISTINCT DISTINCT clause is used None except to provide functionality The input data set has to be sorted

Union two result sets UNION UNION clause is used None except to provide functionality The size of the result is unknown in advance

Combine two result sets UNION-ALL UNION ALL clause is used Fast, no memory overhead The size of the result is unknown in advance

Subtract two result sets EXCEPT EXCEPT clause is used None except to provide functionality The input data sets have to be sorted

Sort, then group by columns marked as distinct and retain only the first column for each group DISTINCT-GROUP This is a special eXtremeDB SQL extension Compact, simple and efficient way of getting first/last records in the group This is to substitute a standard SQL WINDOW clause

Transpose sequences to rows (vertical to horizontal representation) FLATTEN Select with the FLATTENED qualifier Allows applying standard SQL operators to vertical data Significantly increase the result set size

Limit the result set: specify the offset and the maximum number of records in the result set LIMIT The LIMIT clause is present Reduce size of the result set No fast way to skip records

Materialize result set in memory MATERIALIZE Subquery which doesn't depend on the outer query Avoid redundant calculation by buffering the result set of the intermediate query Large memory footprint

Following are some definitions that can help understand how the optimizer determines the execution plan for a query:

Indexable Expression

An expression is an indexable expression if:

it is a column of the table being inspected (while performing a table join) and there is an index defined in the database schema for this field or for a set of fields (compound key) where this field is the first.

the expression is a reference access expression (pointer_field.referenced_expression) where pointer_field is a reference field of the table being inspected and referenced_expression is an indexable expression and there is an index defined in the database schema for pointer_field. In this case, eXtremeSQL will first perform an index search to select records matching referenced_expression and then perform an index search in this table using the index for pointer_field, locating records in which pointer_field refers to one of the records in this selection.
For example, suppose we have two tables, Supplier and Order, and the query:
     
    select * from Order where supplier.location.city=’New York’;
     
This query can be executed using an index search if:

there is an index for the field location.city in the Supplier table.

there is an index for the field supplier in the table Order (where supplier is a reference field having type autoid_t and it references the autoid of a Supplier record).

AUTOID is maintained for the table Supplier.
If these conditions are true, then eXtremeSQL will first perform an index search in the table Supplier, selecting suppliers located in New York, and for each selected record search its AUTOID in the index for field supplier in the table Order.

Note that when defining a table with SQL DDL, in order for the optimizer to utilize index-based optimizations, the index fields must be declared not null. For example:
     
    XSQL>create table t (i int not null, j int not null);
    XSQL>create index idx on t (i, j);
     
Known Value

An expression is a known value if it is a literal or a query parameter, and it is a field of a table with a smaller table number than the table being inspected.

For example, consider two tables:
     
    Create table Person (pid integer primary key, name string using index);
    Create table Hobby (pid person key, description string);
     
To select all hobbies of a Person, there are two possible formulations of the SQL:

GOOD example:

     
    Select * from Person p, Hobby h where p.name=’John Smith’ and p.pid = h.pid;

BAD example:

     
    Select * from Hobby h, Person p where p.name=’John Smith’ and p.pid = h.pid;

From the information given earlier we know that the conjuncts will be ordered as follows:

GOOD EXAMPLE BAD EXAMPLE

p.name = “John Smith” p.pid = h.pid

p.pid = h.pid p.name = “John Smith”

In the GOOD example, an index search for p.name = “John Smith” will be evaluated first, then an index join to select its Hobby (or hobbies).

In the BAD example, a sequential scan of Hobby will be processed and for each record find the related Person using indexed join, and then check that the name of this person is "John Smith".

Distinct

When the DISTINCT qualifier is in the query, then after applying all possible optimizations, all selected tuples are sorted by all columns and duplicates are removed. If the ORDER BY or GROUP BY clause is present in the statement together with the DISTINCT qualifier, then the fields in the ORDER BY or GROUP BY are compared first during sorting, so the sort operation is performed only once.

eXtremeSQL uses the Quicksort algorithm for sorting records. eXtremeSQL first extracts the sort keys into a separate array (or part of the key in case of strings), then sorts this array, and finally refines the order by performing a comparison of all columns mentioned in the ORDER BY list.

Order By

When a tree index is used to select records, selection is automatically sorted by the key corresponding to this index. If there is an explicit ORDER BY clause in this statement, sorting records by this key, then eXtremeSQL does not perform extra sorting, since the records selected are already in the requested order. The direction (ascending or descending) of the ORDER BY clause should be the same as the direction specified for this field in the index.

Subquery

eXtremeSQL optimizes the execution of subqueries by checking the dependencies of the subquery expression. The result returned by the subquery execution is saved and only recalculated if the subquery expression refers to the fields from the enclosing scope.

Show Plan

To view the query execution plan, invoke the trace(true) method of the SqlEngine class. Then, during execution of each statement you will see a dump of the query and execution of indexed and sequential searches. Please note that eXtremeSQL doesn't store query execution plans and the decision whether to apply an index or not is taken when the query is executed. In other words, eXtremeSQL has no precompiled queries. Making decisions about applying indexes during query execution time allows taking into account the actual values of searched operands.

For example, use an index search in query:

 
    db.executeQuery("select * from T where name like %s", "John%");

and use a sequential search in query:

 
    db.executeQuery("select * from T where name like %s", "%John%");

A precompiled execution plan would have insufficient information to make this judgment.

Relational operation	“explain” plan abbreviation	Used by the optimizer when	Pros	Cons
Sequential table scan	`SEQ-SCAN`	No indexes are available for the query	Records are accessed in the sequential order (in the order in which they were written to the database)	Large per record access overhead. There is no separate tablespace
Index scan	`INDEX-SCAN`	The search predicate includes key fields; The result set needs to be sorted by the key	Fast	Generally speaking random access to storage to access. That can be mitigated by presorting tables in the order of the index through the batch insert
Index scan within the range (both left and right boundaries)	`INDEX-RANGE-SCAN`	When the BETWEEN clause is used; A pair of comparison operators are used for the key	This is more efficient than the combination of the INDEXSCAN and FILTER, because the scan is interrupted just as soon as possible	Doesn't support compound keys
Merging results of several index scans	`INDEX-MERGE`	The key is used in the “IN” operator, or in multiple “OR” predicates	Fast	It is efficient only for indexes with high selectivity
Nested loop join	`NESTED-LOOP-JOIN`	Is used when neither the hash nor the index joins are possible	Very versatile — handles any join condition including cross joins (all records where each row from the first table is combined with each row from the second table	Very slow
Index join	`INDEX-JOIN`	An index exists for the inner table join key	The fastest join algorithm	Bad locality of references (hash is not ordered in any way)
Hash join	`HASH-JOIN`	An index join is not applicable and the join condition contains only equality comparisons	It is faster than the merge (sort) join	Requires a lot of memory as the hash table for the join is built in memory. Therefore could run out of quota (if defined)
Filter	`FILTER`	The query contains a WHEN clause and there are no indexes available	Versatile	Slow
Project	`SELECT`	To select a subset of columns (fields)	Reduces the size of the results set (selected data set). Sometimes dramatically	Requires to populate temporary records instead of using direct references to objects leading to memory and performance overhead
Hash aggregate	`HASH-AGGREGATE`	Simple aggregation with the GROUP BY	The fastest way of aggregation with a group by	Builds hash table in memory — sense the memory overhead and may run out of quota
Grand aggregate	`GRAND-AGGREGATE`	Simple aggregation without the GROUP BY	The fastest way of aggregation without group by	None
Sort aggregate	`SORT-AGGREGATE`	Aggregation that utilizes sort operation by the group keys	Versatile— handles arbitrary expressions in aggregates	Memory overhead: requires to materialize the result set and sort it there. Can run out of quota
Sorting	`ORDER-BY`	Used when the query order cannot be satisfied with the index scan	Versatile: handles an arbitrary order clause	Memory overhead: sorting requires to materialize the result set. Can run out of quota
Top-N sort	`ORDER-BY.LIMIT`	Used when the query includes an ORDER-BY clause followed by a LIMIT clause	Retains only N records in memory	Can be slower than the SORT +LIMIT combination for a large N
Intersect two result sets	`INTERSECT`	INTERSECT clause is used	None except to provide functionality	The input result sets have to be sorted
Eliminates duplicates	`DISTINCT`	DISTINCT clause is used	None except to provide functionality	The input data set has to be sorted
Union two result sets	`UNION`	UNION clause is used	None except to provide functionality	The size of the result is unknown in advance
Combine two result sets	`UNION-ALL`	UNION ALL clause is used	Fast, no memory overhead	The size of the result is unknown in advance
Subtract two result sets	`EXCEPT`	EXCEPT clause is used	None except to provide functionality	The input data sets have to be sorted
Sort, then group by columns marked as distinct and retain only the first column for each group	`DISTINCT-GROUP`	This is a special eXtremeDB SQL extension	Compact, simple and efficient way of getting first/last records in the group	This is to substitute a standard SQL WINDOW clause
Transpose sequences to rows (vertical to horizontal representation)	`FLATTEN`	Select with the FLATTENED qualifier	Allows applying standard SQL operators to vertical data	Significantly increase the result set size
Limit the result set: specify the offset and the maximum number of records in the result set	`LIMIT`	The LIMIT clause is present	Reduce size of the result set	No fast way to skip records
Materialize result set in memory	`MATERIALIZE`	Subquery which doesn't depend on the outer query	Avoid redundant calculation by buffering the result set of the intermediate query	Large memory footprint

GOOD EXAMPLE	BAD EXAMPLE
`p.name = “John Smith”`	`p.pid = h.pid`
`p.pid = h.pid`	`p.name = “John Smith”`