Hash Aggregate Grouping

Hash Aggregate Grouping C Functions

The Hash Aggregate C API functions group aggregates based on hash functions. So, unlike the mco_seq_group_agg_TYPE() APIs, these functions do not require that the grouping sequence be ordered. Also unlike all other sequence functions, this group of functions allocates dynamic memory for the hash table. If the n_groups parameter is 0, then the internal database runtime parameter mco_seq_hash_init_size is used to specify the size of the hash table and it is automatically extended when the threshold determined by mco_seq_hash_load_factor is reached. This dynamic hash table memory should be explicitly released by calling mco_seq_free_hash().

The result of all Hash Aggregate functions is returned in the result_agg or result_count sequence as the calculated aggregate or count for each group. The input sequence is split into groups based on the group_by sequence. The argument result_group_by returns the sequence of values used for grouping in order for the application to make the correspondence between the values in the result sequence and the groups.

For example, consider the following code snippet demonstrating how to use a hash aggregate function to calculate the average Close price for Volume values in ranges of 0..9, 10..19, 20..29, etc.:

 
    {
        mco_trans_h trans;
        mco_cursor_t quote_cursor;
        Quote quote;
        mco_seq_iterator_t close_iterator, volume_iterator, volume_div_10_iterator; 
        mco_seq_iterator_t group_iterator,  avg_iterator;
        MCO_RET rc;
        ...
         
        for (rc = mco_cursor_first(trans, &quote_cursor); 
            rc != MCO_S_CURSOR_END; 
            rc = mco_cursor_next(trans, &quote_cursor)) 
        {
            Quote_from_cursor(trans, &quote_cursor, &quote);
            Quote_close_iterator(&quote, &close_iterator);
            Quote_volume_iterator(&quote, &volume_iterator);
             
            rc = mco_seq_const_uint4(&const_iterator, 10);
            rc = mco_seq_div_uint4(&volume_div_10_iterator, &volume_iterator, &const_iterator);
            rc = mco_seq_hash_agg_avg_float(&avg_iterator, &group_iterator,
            
                    &close_iterator, &volume_div_10_iterator, 0);
            
            ...
            mco_seq_free_hash(&avg_iterator);
        }
        ...
    }
     

Suppose that we have Volume and Close sequences with the following thirteen values:

 
    Volume: {335, 808, 45, 402, 732, 48, 805, 450, 385, 420, 367, 69, 77}
            
     
    Close: {62.830, 45.790, 74.730, 53.460, 67.870, 50.370, 80.320, 
            29.940, 25.920, 37.280, 80.920, 57.730, 43.850}
     

To split the Volume sequence into groups associated with ranges of 0..9, 10..19, 20..29, etc., we created the constant sequence iterator with values of 10 and use it as input to the mco_seq_div_uint4() API. This produces the following sequence values which are returned in the result_group_by sequence (group_iterator in this example) :

 
    {80, 73, 7, 38, 4, 40, 6, 45, 33, 42, 36}
            
     

Note that these values do not correspond to the positions of the input sequence. In fact the highlighted value 80 is the result of dividing the second and seventh elements of the Volume sequence. So the Volume values that fall into the 800-810 range (i.e. 808 and 805) are the first group.

For this input, the resulting average Close sequence returned in the result_agg sequence (avg_iterator in this example) by function mco_seq_hash_agg_avg_float() will have the following values:

 
    {63.055, 67.870, 43.850, 25.920, 62.550, 53.460, 57.730, 29.940, 62.830, 37.280, 80.920}
     

To associate the values in the resulting averages with the corresponding group of average Close values and verify the computation, note the following:

Hash Aggregate Grouping SQL Functions

The SQL API provides function seq_hash_group_by() to produce the grouping values iterator used as input to the seq_hash_agg_*() functions.