The Sql Engine allows user-defined functions (UDF) to perform custom aggregation, i.e. functions used with the SQL
group by
operator. These user-defined functions can be used like SQL functions,
sum()
count()
,max()
etc. (Please see page eXtremeSQL User Defined Functions for further details.)For an overview see page eXtremeSQL Functions
To use user-defined aggregating functions the following steps must be performed:
- Use the SQL
set
statement to set parameterimplicit_aggregates
to enable using aggregating functions:set implicit_aggregates=true;
- Write the UDF code in C++, Lua or Python that has exactly 2 arguments:
state
andops
of typearray(T)
that returns anarray(T)
whereT
is function type. (Parameterstate
is expected to be an internal state variable to accumulate temporary results andops
is an array of function arguments.- Call the UDF in the following sequence:
- First call with NULL as the
state
value andops
is non-NULL. Here the UDF is expected to create astate
array and useops
as the first set of values. Thestate
array is returned as function result.- Next call the UDF with a non-NULL
state
value and non-NULL value forops
to process the next set ofops
values.- When dataset has been processed, call the UDF with NULL for
ops
to cause the function to perform final calculations and return its result as an array.Note that even if the UDF is expected to take a scalar argument and return single scalar value, the aggregating UDF must pass array arguments and return an array.
Lua Example
The UDF could be created with Lua code like the following:
create function my_avg(state array(int), ops array(int)) returns array(int) in 'lua' as ' require("test_udf") return my_avg_impl(state, ops) end ';The function
my_avg_body()
could be implemented as follows:function my_avg_impl(state, ops) if ops ~= nil then -- Aggregating part v = ops[1] if state == nil then state = {v, 1} else state[1] = state[1] + v state[2] = state[2] + 1 end return state else -- Finalizing part local avg = {state[1] / state[2]} return avg end endConsider table Employee initialized with the following data:
create table Employee( id integer, dept_no integer, salary integer); insert into Employee values(1, 1 90000); insert into Employee values(2, 1 100000); insert into Employee values(3, 1 110000); insert into Employee values(4, 2 100000); insert into Employee values(5, 2 120000); insert into Employee values(6, 2 130000);Then the UDF would be invoked with SQL statements like the following:
set implicit_aggregates=true; select dept_no, my_avg([null]) from Employee group by dept_no; select dept_no, my_avg([salary]) from Employee group by dept_no; select dept_no, my_avg([null]) from Employee group by dept_no;and would return the following result set:
dept_no #2 -------------------------------------------------------- 1 [100000] 2 [120000]C++ Example
For a C++ UDF, the SQL function registration might look like the following:
create function bx_wt_avg(state array(double), ops array(double)) returns array(double) as 'api-sql-09-functions-aggregate', 'bx_wt_avg';The corresponding C++ function prototype could be as follows:
The actual function implementation can be found in the SDK sample source file "
samples/native/sql/api/sql-09-functions/aggregate.udf.cpp
".Then the UDF would be invoked with SQL statements like the following:
create table t(x double,y double); insert into t values (1,10),(2,20),(3,30); select bx_wt_avg([x,y]) from t;and would return the following result set:
#1 -------------------------------------------------------- [2.33333333333333, 0.014, 140, 60, 3, 6.95335580371252e-310, 360]