Analytics Functions Example 11: Hash Aggregate

Hash Aggregate Average

Compute the average close value for groups of trades having volume values grouped by millions of trades per day (for the first two weeks of March 2013):

     
    SELECT symbol, 
        seq_search(day, 20130304, 20130315) as Mar_2013,
        volume@Mar_2013 as "volume", close@Mar_2013 as "close",
        seq_hash_agg_avg(close@Mar_2013, volume@Mar_2013/1000000) as avg_vol,
        seq_hash_group_by(avg_vol) as "group_by"
    FROM Quote WHERE symbol='IBM';
     
    symbol
    Mar_2013{}
    volume{}
    close{}
    avg_vol{}
    group_by{}
    ------------------------------------------------------------------------------
    IBM
    {20130304, 20130305, 20130306, 20130307, 20130308,
    20130311, 20130312, 20130313, 20130314, 20130315}
    {3693365, 3807706, 3594899, 3884317, 3700786,
    3049701, 3592096, 3356946, 5505584, 7936544}
    {205.190002, 206.529999, 208.380005, 209.419998, 210.380005,
    210.080002, 210.550003, 212.059998, 215.800003, 214.919998}
    {214.919998, 209.073751, 215.800003}
    {7, 3, 5}
     

Note that the "hash_aggregate" functions calculate aggregate values for each group determined by an arbitrary sequence (not necessarily sorted, such as close above) using a hash table. So to interpret the result sets from these functions a “group_by” sequence is required. The function seq_hash_group_by returns the sequence of values used for grouping in order to make the correspondence between the values in the result sequence(s) and the groups.

In the example above, the close prices are divided into groups associated with volume values where the volume is in ranges of (0-999,999), (1,000,000-1,999,999), (2,000,000-2,999,999), etc. To illustrate we analyze the output as follows:

(205.190002 + 206.529999 + 208.380005 + 209.419998 + 210.380005 + 210.080002 + 210.550003 + 212.059998) / 8 = 209.073751

Note also that the single occurrence of volume in the 5M range corresponds to the single avg_vol value 215.800003, and the single occurrence of volume in the 7M range corresponds to the single avg_vol value 214.919998.

To run this example

A sample script to demonstrate this select statement using xSQL can be run from the samples/xsql/scripts/financial directory with the following command:

     
    x 11