public class SyntheticSourceOptions extends SyntheticOptions
SyntheticSourceOptions uses jackson annotations which
PipelineOptionsFactory can use to parse and construct an instance.| Modifier and Type | Class and Description |
|---|---|
static class |
SyntheticSourceOptions.ProgressShape
Shape of the progress reporting curve as a function of the current offset in the
SyntheticBoundedSource. |
static class |
SyntheticSourceOptions.Record
Record generated by
genRecord(long). |
SyntheticOptions.DelayType, SyntheticOptions.Sampler| Modifier and Type | Field and Description |
|---|---|
SyntheticOptions.Sampler |
bundleSizeDistribution
Distribution for generating initial split bundles.
|
java.lang.Integer |
forceNumInitialBundles
If specified, this source will split into exactly this many bundles regardless of the hints
provided by the service.
|
long |
numRecords
Total number of generated records.
|
SyntheticSourceOptions.ProgressShape |
progressShape
|
long |
splitPointFrequencyRecords
Only records whose index is a multiple of this will be split points.
|
java.lang.Integer |
watermarkDriftMillis
Could be either positive and negative.
|
java.lang.Integer |
watermarkSearchInAdvanceCount
Defines how many elements should the watermark function check in advance to "predict" how the
record distribution will look like.
|
bytesPerRecord, cpuUtilizationInMixedDelay, delayType, hotKeyFraction, keySizeBytes, largeKeyFraction, largeKeySizeBytes, numHotKeys, seed, valueSizeBytes| Constructor and Description |
|---|
SyntheticSourceOptions() |
| Modifier and Type | Method and Description |
|---|---|
SyntheticSourceOptions.Record |
genRecord(long position) |
org.joda.time.Duration |
nextInitializeDelay(long seed)
Generates a random delay value for the synthetic source initialization using the distribution
defined by
initializeDelayDistribution. |
org.joda.time.Duration |
nextProcessingTimeDelay(long seed)
Generates a random delay value between event and processing time using the distribution defined
by
processingTimeDelayDistribution. |
void |
validate() |
fromIntegerDistribution, fromJsonString, fromRealDistribution, genKvPair, hashFunction, nextDelay, setSeed, toStringpublic long numRecords
public long splitPointFrequencyRecords
public SyntheticOptions.Sampler bundleSizeDistribution
When splitting into "desiredBundleSizeBytes", we'll compute the desired number of bundles N, then sample this many numbers from this distribution, normalize their sum to 1, and use that as the boundaries of generated bundles.
The Zipf distribution is expected to be particularly useful here.
E.g., empirically, with 100 bundles, the Zipf distribution with a parameter of 3.5 will generate bundles where the largest is about 3x-10x larger than the median; with a parameter of 3.0 this ratio will be about 5x-50x; with 2.5, 5x-100x (i.e. 1 bundle can be as large as all others combined).
public java.lang.Integer forceNumInitialBundles
public SyntheticSourceOptions.ProgressShape progressShape
public java.lang.Integer watermarkSearchInAdvanceCount
public java.lang.Integer watermarkDriftMillis
By default there is no drift at all.
public org.joda.time.Duration nextInitializeDelay(long seed)
initializeDelayDistribution.public org.joda.time.Duration nextProcessingTimeDelay(long seed)
processingTimeDelayDistribution.public void validate()
validate in class SyntheticOptionspublic SyntheticSourceOptions.Record genRecord(long position)