Using Sequences in Python

As explained in the User's Guide page, a sequence is an unbounded array of eXtremeDB-supported scalar data elements. The Python SequenceIterator class is used to manage database fields of type sequence. As sequences are effectively vectors of values, they are accessed through iterators. Where the C type mco_seq_iterator_h is used in C applications, the Python class SequenceIterator serves the equivalent role in Python applications. The SequenceIterator class also provides a powerful set of Analytics Methods for performing mathematical and statistical operations on sequences.

Inserting and Updating Sequences in Python

Normally sequence data is inserted using the SequenceIterator append() method. For example consider the following schema definition:

 
    #define uint4 unsigned<4>
     
    class Quote {
        char<16>            symbol;
        sequence<uint4 asc> day;
        sequence<float>     price;
         
        tree<symbol> by_symbol;
    };
     

With this class definition, the Quote sequence fields day and price can be populated with code like the following:

 
    con.startTransaction()
    tick = generate_random_quote()
    quote = con.new("Quote")
    quote.symbol = tick.symbol
    quote.day.append(tick.day)
    quote.price.append(tick.price)
    con.commit()
     

It may sometimes be necessary to insert values into an ordered time series. Values can be inserted into an existing sequence using the insert() method. For example, the following code snippet searches for the Quote object with symbol "IBM", then inserts day and price values:

 
    con.startTransaction()
    for quote in cursor:
        dayit = quote.day.search(20130101, exdb.SeqIteratorBoundary.MCO_SEQ_BOUNDARY_INCLUSIVE, 
                        20130101, exdb.SeqIteratorBoundary.MCO_SEQ_BOUNDARY_INCLUSIVE)
        dayit = append(20130102)
        quote.day.next()
        quote.day.insert(dayit)
        
         
    con.commit()
     

Sequence Interface to NumPy

If Python has the numPy module installed, it is possible to load and save data from sequences directly to a numPy array. (Note that due to the nature of sequence data in eXtremeDB, only 1-dimensional numPy arrays are supported.)

Appending a numPy Array to a Sequence.

The SequenceIteratorappend() method can accept a numpy.ndarray as an argument. For example:

 
    >>> ac = numpy.arange(10,  dtype='uint32')
    >>> ac
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint32)
    >>> quote.volume.append(ac)
     

Both numPy and eXtremeDBsequences use C-Style binary array representation for efficiency. So the data types for sequence and nparray must match. This is why dtype=uint32 is the argument specified.

The append() method recognizes the following argument types:

 
    o.price.append(1)
     

This will simply append a value to the end of the sequence. (This is the slowest way to update a sequence.)

 
    o.price.append([1,2,3,4,5])
     

Here the 5 values in the list are appended to the sequence.

 
    o.price.append( (1,2,3,4,5) )
     

Again the 5 values in the list are appended to the sequence.

The Python standard library supports a homogenous typed array with elements of the same size. In fact, this is the wrapper over C-style array, and it can be used to fill a sequence with data as follows:

     
    import array
    a = array.array('f', [1,2,3,4])
    o.price.append(a)
     

From the database point of view, this is the fastest way of inserting data, but keep in mind the following considerations:

1. The data type for array.array must exactly match the data type for the database sequence. Type mappings between eXtremeDB database types, Python and C types are shown in the following table:

Sequence type array.array typecode C type
signed<1> 'b' signed char
unsigned<1> 'B' unsigned char
signed<2> 'h' signed short
unsigned<2> 'H' unsigned short
signed<4> 'i' signed int
unsigned<4> 'I' unsigned int
signed<8> 'l' signed long long
unsigned<8> 'L' unsigned long long
float 'f' float
double 'd' double
char<n> - -

(Note, that the char<n> type is not supported in array.array.)

2. Speed. Although this is the fastest way to insert data into a database, the creation of array.array incurs its own overhead. Tests show that the longer the array being used to insert, the greater the improvement in performance. (It's feasible to use array.array with data chunks of much more that a 100 items.) Also, the Python interpreter is quite slow filling array.array with values. A real advantage can be achieved using other methods for filling that array, like: array.fromfile(f, n)or array.fromstring(s). (Please consult the Python documentation for these methods.)

Retrieving Sequence Data to an nparray

If a sequence is of moderate size, it is possible to represent the entire sequence as a numPy array using the SequenceIterator method tonparray(). Its signature is:

 
    SequenceIterator.tonparray([size_hint])
     

For a materializedsequence, the eXtremeDB runtime knows exactly what the sequence size is, so the required memory for the resulting array is allocated immediately. For a non-materialized sequence, it is not possible to know in advance the size of the resulting array. In this case, memory is allocated by chunks as the temporary result is built. This is not very efficient, so if the application knows the expected array size, this can be passed to tonparray() as the size_hint argument.

 

Sequences in Python Embedded SQL Applications

The eXtremeDB Python wrapper extends the Python DB API Specification by allowing operations with sequences. A sequence may appear as a result of a select statement, or, as a parameter in insert or update statements. For example, the following statement could be sent to the SQL engine:

 
    cursor.execute("SELECT stamp, low, high FROM Quote WHERE symbol = ?", ('AAA',))
     

The result of this query can be as follows:

 
    row = cursor.fetchone()
    print "Row is:", row
    Row is: (<exdb.SequenceIterator object at 0x1008f6610>, <exdb.SequenceIterator object at 0x1008f7610>, 
        <exdb.SequenceIterator object at 0x1008f8610>)
     
     

Note that the resulting row contains SequenceIterator objects as row values. Regular iterator methods can be applied to this iterator. Note that, except in the case of a query result, the returned iterator will be “not materialized”.

To insert data into a sequence, pass data into the SQL statement as a parameter using an array.array . For example:

 
    cursor.execute('INSERT INTO Quote(symbol, stamp, open, close, high, low, volume) VALUES (?,?,?,?,?,?,?)',
        ('AAA', array.array('I', timestamps), array.array('f', opens), array.array('f', closes), 
        array.array('f', highs), array.array('f', lows), array.array('I', volumes)))
         

Note that array.array is strictly typed, so its type must match the sequence column type. If it does not match, the resulting behavior is unpredictable. Passing a Python list or tuple as a parameter value is not supported. Please consult the mappings table above for the correspondences between sequence types and array.array types.