public class Sample extends Object
PTransforms for taking samples of the elements in a
PCollection, or samples of the values associated with each
key in a PCollection of KVs.| Modifier and Type | Class and Description |
|---|---|
static class |
Sample.FixedSizedSampleFn<T>
CombineFn that computes a fixed-size sample of a
collection of values. |
static class |
Sample.SampleAny<T>
A
PTransform that takes a PCollection<T> and a limit, and
produces a new PCollection<T> containing up to limit
elements of the input PCollection. |
| Constructor and Description |
|---|
Sample() |
| Modifier and Type | Method and Description |
|---|---|
static <T> PTransform<PCollection<T>,PCollection<T>> |
any(long limit)
Sample#any(long) takes a PCollection<T> and a limit, and
produces a new PCollection<T> containing up to limit
elements of the input PCollection. |
static <T> PTransform<PCollection<T>,PCollection<Iterable<T>>> |
fixedSizeGlobally(int sampleSize)
Returns a
PTransform that takes a PCollection<T>,
selects sampleSize elements, uniformly at random, and returns a
PCollection<Iterable<T>> containing the selected elements. |
static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Iterable<V>>>> |
fixedSizePerKey(int sampleSize)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, Iterable<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a sample of sampleSize values
associated with that key in the input PCollection, taken
uniformly at random. |
public static <T> PTransform<PCollection<T>,PCollection<T>> any(long limit)
Sample#any(long) takes a PCollection<T> and a limit, and
produces a new PCollection<T> containing up to limit
elements of the input PCollection.
If limit is greater than or equal to the size of the input
PCollection, then all the input's elements will be selected.
All of the elements of the output PCollection should fit into
main memory of a single worker machine. This operation does not
run in parallel.
Example of use:
PCollection<String> input = ...;
PCollection<String> output = input.apply(Sample.<String>any(100));
T - the type of the elements of the input and output
PCollectionslimit - the number of elements to take from the inputpublic static <T> PTransform<PCollection<T>,PCollection<Iterable<T>>> fixedSizeGlobally(int sampleSize)
PTransform that takes a PCollection<T>,
selects sampleSize elements, uniformly at random, and returns a
PCollection<Iterable<T>> containing the selected elements.
If the input PCollection has fewer than
sampleSize elements, then the output Iterable<T>
will be all the input's elements.
Example of use:
PCollection<String> pc = ...;
PCollection<Iterable<String>> sampleOfSize10 =
pc.apply(Sample.fixedSizeGlobally(10));
T - the type of the elementssampleSize - the number of elements to select; must be >= 0public static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Iterable<V>>>> fixedSizePerKey(int sampleSize)
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, Iterable<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a sample of sampleSize values
associated with that key in the input PCollection, taken
uniformly at random. If a key in the input PCollection
has fewer than sampleSize values associated with it, then
the output Iterable<V> associated with that key will be
all the values associated with that key in the input
PCollection.
Example of use:
PCollection<KV<String, Integer>> pc = ...;
PCollection<KV<String, Iterable<Integer>>> sampleOfSize10PerKey =
pc.apply(Sample.<String, Integer>fixedSizePerKey());
K - the type of the keysV - the type of the valuessampleSize - the number of values to select for each
distinct key; must be >= 0