Aggregator is used to perform aggregation. Grouping and aggregation is done using two interfaces Aggregate and GroupBy. There are some predefined implementations for standard aggregates: max, min, sum, avg, count, etc. And it is possible to define user-defined aggregates. The GroupBy interface should be implemented by the developer (it is convenient to use anonymous classes) and defines aggregation for queries; i.e. how to split input data into groups that aggregate for the calculations.
Aggregator uses a map to associate aggregate states with groups. This map is returned as the result of aggregation. Aggregator can use an ordered or unordered map (i.e a TreeMap or HashMap). An ordered map returns results in ascending order of the group-by values. For example, consider the table:
class Quote
{
@Indexable
public long date;
public float open;
public float close;
public float low;
public float high;
public int volume;
};
Now to execute a query like "select standard deviation of difference between low and high prices for IBM for each month since 1990", we would implement code like the following:
Cursor<Quote> cursor = new Cursor<Quote>(con, Quote.class, "date");
if (cursor.search(Operation.GreaterOrEquals, (new Date(1990, 0, 1)).getTime()))
{
Map<Object,Aggregator.Aggregate> result = Aggregator.<Quote>aggregate(cursor,
new Aggregator.GroupBy<Quote>()
{
public Aggregator.Aggregate getAggregate()
{
return new Aggregator.DevAggregate();
}
public Object getKey(Quote quote)
{
return (new Date(quote.date)).getMonth();
}
public Object getValue(Quote quote)
{
return quote.high - quote.low;
}
public Aggregator.FilterResult filter(Quote quote)
{
return Filter.Use;
}
}, true);
for (Map.Entry<Object,Aggregator.Aggregate> pair : result.entrySet())
{
System.out.println("Group " + pair.getKey() + "->" + pair.getValue().result());
}
}
public class Aggregator
{
...
public enum FilterResult
{
Use,
Skip,
Stop
};
public interface Aggregate<T> {…}
public interface GroupBy<T> {…}
public static <T> Map<Object,Aggregate> ... {…}
public static void merge(Map<Object,Aggregate> dst, Map<Object,Aggregate> src) {…}
public static class TopAggregate implements Aggregate<Comparable> {…}
public static class MaxAggregate implements Aggregate<Comparable> {…}
public static class MinAggregate implements Aggregate<Comparable> {…}
public static class RealSumAggregate implements Aggregate<Number> {…}
public static class IntegerSumAggregate implements Aggregate<Number> {…}
public static class AvgAggregate implements Aggregate<Number> {…}
public static class PrdAggregate implements Aggregate<Number> {…}
public static class VarAggregate implements Aggregate<Number> {…}
public static class DevAggregate extends VarAggregate {…}
public static class CountAggregate implements Aggregate {…}
public static class DistinctCountAggregate implements Aggregate {…}
public static class RepeatCountAggregate implements Aggregate {…}
public static class ApproxDistinctCountAggregate implements Aggregate {…}
public static class FirstAggregate implements Aggregate {…}
public static class LastAggregate implements Aggregate {…}
public static class CompoundAggregate implements Aggregate {…}
};
enum FilterResult
|
Enumerated constants used to control filtering of query results:
public enum FilterResult
{
Use,
Skip,
Stop
};
|
||||||
Aggregate<T>
|
Implemented by all standard aggregates and can be used to define custom aggregates | ||||||
GroupBy<T>
|
Used to specify the aggregation operation | ||||||
|
aggregate(Iterable<T> iterable, GroupBy<T> groupBy) |
Performs the aggregation; Parameters:
Returns: a map with the results of aggregation: <group-by,aggregate-value> pairs |
||||||
|
aggregate(Iterable<T> iterable, GroupBy<T> groupBy, boolean orderByKey) |
Performs the aggregation; Parameters:
Returns: a map with the results of aggregation: <group-by,aggregate-value> pairs |
||||||
|
|
Merge two aggregation results. This method combines the state of aggregates in dst with the aggregate states in src; i.e. dst = merge(dst, src) |
||||||
| Embedded Classes | |||||||
TopAggregate
|
Aggregate returning the top N values |
||||||
MaxAggregate
|
The maximum aggregate | ||||||
MinAggregate
|
The minimum aggregate | ||||||
RealSumAggregate
|
The sum aggregate for real values | ||||||
IntegerSumAggregate
|
The sum aggregate for integer values | ||||||
AvgAggregate
|
The average aggregate | ||||||
PrdAggregate
|
The product aggregate | ||||||
VarAggregate
|
The variance aggregate | ||||||
DevAggregate
|
The standard deviation aggregate | ||||||
CountAggregate
|
The "count all" aggregate | ||||||
DistinctCountAggregate
|
The "distinct count" aggregate (note that this method can have a large memory footprint) | ||||||
RepeatCountAggregate
|
Counts the number of items repeated N or more times | ||||||
ApproxDistinctCountAggregate
|
The approximate "distinct count" aggregate (note that this is not a precise result) | ||||||
FirstAggregate
|
The first group element aggregate | ||||||
LastAggregate
|
The last group element aggregate | ||||||
CompoundAggregate
|
The compound aggregate: the combination of several aggregates (note that this can calculate more than one aggregate on one traversal) | ||||||