eXtremeSQL Shuffle Join

eXtremeSQL implements the classical map-reducing algorithm "shuffle join" when performing a join between tables. For example, consider two tables that need to be joined with a statement like the following:

 
    select * from A join B on A.id = B.id;
                
      

Here first the outer table A is scattered into a few chunks using the hash on A.id with an operation like hash(A.id) mod nChunks. If nChunks is 3, this produces three chunks A1, A2 and A3. Then the inner table B is scattered in the same way, producing files B1, B2 and B3.

Now it is possible to perform the join using the pairs (A1,B1), (A2,B2), and (A3,B3).

This method allows joining two tables that cannot fit in memory. The value of nChunks is chosen in such a way that data in file Bi fits in memory. Each chunk is stored in its own file. Then one path through table A and one path through table B is used to append data to the corresponding files, which does not consume memory. Then the pairs of "chunk files" are traversed, loading the inner chunk in memory in a hash table and scanning the inner chunk through in-memory hash lookups.

Distributed shuffle join

The Distributed Shuffle Join is based on the same idea, but in this case data is scattered between several nodes. Each node stores just one pair of chunks (Ai,Bi). (Note that it is assumed that data is partitioned between nodes in such a way that each partition fits in memory.)

Unlike the local shuffle join, the distributed shuffle join nodes have to exchange data. So it is necessary to establish connections between each of the nodes in a cluster. (This is required only for distributes shuffle join, all other distributed queries require only communications between the coordinator and worker nodes).