A powerful feature of eXtremeDB is the
sequencedata type. Apart from the memory and processing efficiency enabled by the possibility to store data in this "columnar" format, sequences are particularly suited to vector-based statistical functions. Typically, these SQL statistical functions are used to perform analytical operations on “time series”; i.e. a group of related sequence fields.The following examples will perform some statistical functions via queries on a simple database class:
class Quote { char<MAX_SYMBOL_LEN> symbol; sequence<date asc> day; sequence<float> low; sequence<float> high; sequence<float> open; sequence<float> close; sequence<uint4> volume; unique tree<symbol> by_sym; };The data for these examples is taken from historical values from 2013 for IBM. This sample data and a group of script and shell command files that perform the select statements are included in directory
eXtremeDB/samples/native/sql/xsql/scripts/financial. To create table QuoteIBM and load the sample data, open xSQL in this directory with the following command:xsql -i -size 100m -f IBM-q1-2013.sqlSequence Interval
The first example will extract data belonging to a specified interval (January 2013). The ordered sequence
dayis used to select the interval and then this interval is projected to the other (unordered) sequencesopenandclosewith the following query:SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013, open@Jan_2013 as "open", close@Jan_2013 as "close" FROM Quote WHERE symbol='IBM'; symbol Jan_2013 open close -------------------------------------------------- IBM 20130102 194.089996 196.350006 IBM 20130103 195.669998 195.270004 IBM 20130104 194.190002 193.990005 IBM 20130107 193.399994 193.139999 IBM 20130108 192.919998 192.869995 IBM 20130109 193.479996 192.320007 IBM 20130110 192.649994 192.880005 IBM 20130111 194.149994 194.449997 IBM 20130114 192.820007 192.619995 IBM 20130115 191.309998 192.500000 IBM 20130116 192.000000 192.589996 IBM 20130117 193.850006 193.649994 IBM 20130118 194.029999 194.470001 IBM 20130122 194.360001 196.080002 IBM 20130123 203.500000 204.720001 IBM 20130124 203.910004 204.419998 IBM 20130125 204.449997 204.970001 IBM 20130128 204.850006 204.929993 IBM 20130129 204.339996 203.899994 IBM 20130130 203.690002 203.520004 IBM 20130131 203.320007 203.070007 Selected records: 21Note that this query can be run from the script file
x3.sql:XSQL>script x3.sqlNote the use of key word
flattenedto transform the sequence fields into a table so that the result set appears as individual rows for each set of corresponding sequence elements with the scalar fieldsymbolvalue repeated on each row. (Also note that the bounds specified forseq_search()are “inclusive”.)Arithmetic on Sequences
We can perform arithmetic operations on sequence elements. In the following query the two sequences
highandloware averaged by adding each element of high to the corresponding element of low and dividing by 2 to produce a result sequence:SELECT flattened symbol, seq_search(day, 20130101, 20130131) as Jan_2013, high@Jan_2013 as "high", low@Jan_2013 as "low", (high@Jan_2013 + low@Jan_2013) / 2 as "average" FROM Quote WHERE symbol='IBM'; symbol Jan_2013 high low average ------------------------------------------------------------------ IBM 20130102 196.350006 193.800003 195.075012 IBM 20130103 196.289993 194.440002 195.364990 IBM 20130104 194.460007 192.779999 193.619995 IBM 20130107 193.779999 192.339996 193.059998 IBM 20130108 193.300003 191.610001 192.455002 IBM 20130109 193.490005 191.649994 192.570007 IBM 20130110 192.960007 191.279999 192.119995 IBM 20130111 195.000000 192.899994 193.949997 IBM 20130114 193.279999 191.750000 192.514999 IBM 20130115 192.729996 190.389999 191.559998 IBM 20130116 193.179993 191.350006 192.264999 IBM 20130117 194.460007 193.240005 193.850006 IBM 20130118 195.000000 193.800003 194.399994 IBM 20130122 196.080002 194.009995 195.044998 IBM 20130123 208.580002 203.360001 205.970001 IBM 20130124 205.059998 203.080002 204.070007 IBM 20130125 205.179993 204.130005 204.654999 IBM 20130128 206.220001 204.289993 205.255005 IBM 20130129 205.729996 203.639999 204.684998 IBM 20130130 204.880005 203.190002 204.035004 IBM 20130131 204.470001 202.960007 203.714996 Selected records: 21And this query can be run from the script file
x4.sql.Filtering Sequence Elements
The following query will extract the days of February 2013 when the
closevalue was greater thanopen:SELECT symbol, seq_search(day, 20130201, 20130228) as Feb_2013, close@Feb_2013 as Feb_Close, open@Feb_2013 as Feb_Open, seq_filter(Feb_Close > Feb_Open, Feb_2013) as "up_Feb", seq_filter(Feb_Close > Feb_Open, Feb_Open) as "up_Open", seq_filter(Feb_Close > Feb_Open, Feb_Close) as "up_Close" FROM Quote WHERE symbol='IBM'; symbol Feb_2013{} Feb_Close{} Feb_Open{} up_Feb{} up_Open{} up_Close{} ------------------------------------------------------------------------------ IBM {20130201, 20130204, 20130205, 20130206, 20130207, 20130208, 20130211, 20130212, 20130213, 20130214, 20130215, 20130219, 20130220, 20130221, 20130222, 20130225, 20130226, 20130227, 20130228} {205.179993, 203.789993, 202.789993, 201.020004, 199.740005, 201.679993, 200.160004, 200.039993, 200.089996, 199.649994, 200.979996, 200.320007, 199.309998, 198.330002, 201.089996, 197.509995, 199.139999, 202.330002, 200.830002} {204.649994, 204.190002, 204.309998, 200.389999, 200.619995, 199.970001, 200.979996, 200.009995, 200.649994, 199.729996, 199.979996, 200.600006, 200.619995, 198.630005, 199.229996, 201.669998, 198.630005, 198.889999, 202.179993} {20130201, 20130206, 20130208, 20130212, 20130215, 20130222, 20130226, 20130227} {204.649994, 200.389999, 199.970001, 200.009995, 199.979996, 199.229996, 198.630005, 198.889999} {205.179993, 201.020004, 201.679993, 200.039993, 200.979996, 201.089996, 199.139999, 202.330002} Selected records: 1The
up_Febsequence contains the dates when the close value was greater than the open for February of 2013. To illustrate we have highlighted thedatevalue for February 8, 2013 in theup_Febsequence which corresponds with the 3rd element of theup_Openandup_Closesequences and that satisfy the filter condition; i.e. 201.68 > 199.97.Note that the repeated calls of function
seq_filter()are necessary to produce the filteredopenandclosesequences which make the output more easily understandable, as the Feb_2013,Feb_OpenandFeb_Closesequences (which cannot be suppressed) somewhat obscure the essential information.Please view the Analytics Function Library page for many more examples and further explanation of these powerful analytic functions.