Aggregator is used to perform aggregation. Grouping and aggregation is done using two interfaces Aggregate and GroupBy. There are some predefined implementations for standard aggregates: max, min, sum, avg, count, etc. And it is possible to define user-defined aggregates. The GroupBy interface should be implemented by the developer (it is convenient to use anonymous classes) and defines aggregation for queries; i.e. how to split input data into groups that aggregate for the calculations.
Aggregator uses a map to associate aggregate states with groups. This map is returned as the result of aggregation. Aggregator can use an ordered or unordered map (i.e a TreeMap or HashMap). An ordered map returns results in ascending order of the group-by values. For example, consider the table:
class Quote { @Indexable public long date; public float open; public float close; public float low; public float high; public int volume; };
Now to execute a query like "select standard deviation of difference between low and high prices for IBM for each month since 1990", we would implement code like the following:
Cursor<Quote> cursor = new Cursor<Quote>(con, Quote.class, "date"); if (cursor.search(Operation.GreaterOrEquals, (new Date(1990, 0, 1)).getTime())) { Map<Object,Aggregator.Aggregate> result = Aggregator.<Quote>aggregate(cursor, new Aggregator.GroupBy<Quote>() { public Aggregator.Aggregate getAggregate() { return new Aggregator.DevAggregate(); } public Object getKey(Quote quote) { return (new Date(quote.date)).getMonth(); } public Object getValue(Quote quote) { return quote.high - quote.low; } public Aggregator.FilterResult filter(Quote quote) { return Filter.Use; } }, true); for (Map.Entry<Object,Aggregator.Aggregate> pair : result.entrySet()) { System.out.println("Group " + pair.getKey() + "->" + pair.getValue().result()); } }
public class Aggregator { ... public enum FilterResult { Use, Skip, Stop }; public interface Aggregate<T> {…} public interface GroupBy<T> {…} public static <T> Map<Object,Aggregate> ... {…} public static void merge(Map<Object,Aggregate> dst, Map<Object,Aggregate> src) {…} public static class TopAggregate implements Aggregate<Comparable> {…} public static class MaxAggregate implements Aggregate<Comparable> {…} public static class MinAggregate implements Aggregate<Comparable> {…} public static class RealSumAggregate implements Aggregate<Number> {…} public static class IntegerSumAggregate implements Aggregate<Number> {…} public static class AvgAggregate implements Aggregate<Number> {…} public static class PrdAggregate implements Aggregate<Number> {…} public static class VarAggregate implements Aggregate<Number> {…} public static class DevAggregate extends VarAggregate {…} public static class CountAggregate implements Aggregate {…} public static class DistinctCountAggregate implements Aggregate {…} public static class RepeatCountAggregate implements Aggregate {…} public static class ApproxDistinctCountAggregate implements Aggregate {…} public static class FirstAggregate implements Aggregate {…} public static class LastAggregate implements Aggregate {…} public static class CompoundAggregate implements Aggregate {…} };
enum FilterResult
|
Enumerated constants used to control filtering of query results: public enum FilterResult { Use, Skip, Stop }; |
||||||
Aggregate<T>
|
Implemented by all standard aggregates and can be used to define custom aggregates | ||||||
GroupBy<T>
|
Used to specify the aggregation operation | ||||||
aggregate(Iterable<T> iterable, GroupBy<T> groupBy) |
Performs the aggregation; Parameters:
Returns: a map with the results of aggregation: <group-by,aggregate-value> pairs |
||||||
aggregate(Iterable<T> iterable, GroupBy<T> groupBy, boolean orderByKey) |
Performs the aggregation; Parameters:
Returns: a map with the results of aggregation: <group-by,aggregate-value> pairs |
||||||
|
Merge two aggregation results. This method combines the state of aggregates in dst with the aggregate states in src ; i.e. dst = merge(dst, src) |
||||||
Embedded Classes | |||||||
TopAggregate
|
Aggregate returning the top N values |
||||||
MaxAggregate
|
The maximum aggregate | ||||||
MinAggregate
|
The minimum aggregate | ||||||
RealSumAggregate
|
The sum aggregate for real values | ||||||
IntegerSumAggregate
|
The sum aggregate for integer values | ||||||
AvgAggregate
|
The average aggregate | ||||||
PrdAggregate
|
The product aggregate | ||||||
VarAggregate
|
The variance aggregate | ||||||
DevAggregate
|
The standard deviation aggregate | ||||||
CountAggregate
|
The "count all" aggregate | ||||||
DistinctCountAggregate
|
The "distinct count" aggregate (note that this method can have a large memory footprint) | ||||||
RepeatCountAggregate
|
Counts the number of items repeated N or more times | ||||||
ApproxDistinctCountAggregate
|
The approximate "distinct count" aggregate (note that this is not a precise result) | ||||||
FirstAggregate
|
The first group element aggregate | ||||||
LastAggregate
|
The last group element aggregate | ||||||
CompoundAggregate
|
The compound aggregate: the combination of several aggregates (note that this can calculate more than one aggregate on one traversal) |