@PublicEvolving public interface DataDistribution extends IOReadableWritable, Serializable
| Modifier and Type | Method and Description |
|---|---|
Object[] |
getBucketBoundary(int bucketNum,
int totalNumBuckets)
Returns the i'th bucket's upper bound, given that the distribution is to be
split into
totalBuckets buckets. |
TypeInformation[] |
getKeyTypes()
Gets the type of the key by which the dataSet is partitioned.
|
int |
getNumberOfFields()
The number of fields in the (composite) key.
|
read, writeObject[] getBucketBoundary(int bucketNum, int totalNumBuckets)
totalBuckets buckets.
Assuming n buckets, let B_i be the result from calling getBucketBoundary(i, n),
then the distribution will partition the data domain in the following fashion:
(-inf, B_1] (B_1, B_2] ... (B_n-2, B_n-1] (B_n-1, inf)
Note: The last bucket's upper bound is actually discarded by many algorithms.
The last bucket is assumed to hold all values v such that
v > getBucketBoundary(n-1, n), where n is the number of buckets.
bucketNum - The number of the bucket for which to get the upper bound.totalNumBuckets - The number of buckets to split the data into.int getNumberOfFields()
getBucketBoundary(int, int).TypeInformation[] getKeyTypes()
Copyright © 2014–2018 The Apache Software Foundation. All rights reserved.