@TriggerSerially @Stateful(scopes={LOCAL,CLUSTER}, description="After a listing of resources is performed, the latest timestamp of any of the resources is stored in the component\'s state. The scope used depends on the implementation.") public abstract class AbstractListProcessor<T extends ListableEntity> extends AbstractProcessor
An Abstract Processor that is intended to simplify the coding required in order to perform Listing operations of remote resources. Those remote resources may be files, "objects", "messages", or any other sort of entity that may need to be listed in such a way that we identity the entity only once. Each of these objects, messages, etc. is referred to as an "entity" for the scope of this Processor.
This class is responsible for triggering the listing to occur, filtering the results returned such that only new (unlisted) entities or entities that have been modified will be emitted from the Processor.
In order to make use of this abstract class, the entities listed must meet the following criteria:
This class persists state across restarts so that even if NiFi is restarted, duplicates will not be pulled from the target system given the above criteria. This is
performed using the StateManager. This allows the system to be restarted and begin processing where it left off. The state that is stored is the latest timestamp
that has been pulled (as determined by the timestamps of the entities that are returned). See the section above for information about how this information isused in order to
determine new entities.
NOTE: This processor performs migrations of legacy state mechanisms inclusive of locally stored, file-based state and the optional utilization of the Distributed Cache
Service property to the new StateManager functionality. Upon successful migration, the associated data from one or both of the legacy mechanisms is purged.
For each new entity that is listed, the Processor will send a FlowFile to the 'success' relationship. The FlowFile will have no content but will have some set of attributes (defined by the concrete implementation) that can be used to fetch those remote resources or interact with them in whatever way makes sense for the configured dataflow.
Subclasses are responsible for the following:
performListing(ProcessContext, Long) method, which creates a listing of all
entities on the remote system that have timestamps later than the provided timestamp. If the entities returned have a timestamp before the provided one, those
entities will be filtered out. It is therefore not necessary to perform the filtering of timestamps but is provided in order to give the implementation the ability
to filter those resources on the server side rather than pulling back all of the information, if it makes sense to do so in the concrete implementation.
createAttributes(ListableEntity, ProcessContext).
getPath(ProcessContext) method is responsible for returning the path that is currently being polled for entities. If this does concept
does not apply for the concrete implementation, it is recommended that the concrete implementation return "." or "/" for all invocations of this method.
isListingResetNecessary(PropertyDescriptor) method is responsible for determining when the listing needs to be reset by returning
a boolean indicating whether or not a change in the value of the provided property should trigger the timestamp and identifier information to be cleared.
| Modifier and Type | Class and Description |
|---|---|
private static class |
AbstractListProcessor.StringSerDe |
| Modifier and Type | Field and Description |
|---|---|
static PropertyDescriptor |
DISTRIBUTED_CACHE_SERVICE |
private boolean |
justElectedPrimaryNode |
private Long |
lastListingTime |
private Long |
lastProcessedTime |
private Long |
lastRunTime |
static long |
LISTING_LAG_NANOS |
(package private) static String |
LISTING_TIMESTAMP_KEY |
(package private) static String |
PROCESSED_TIMESTAMP_KEY |
static Relationship |
REL_SUCCESS |
private boolean |
resetState |
| Constructor and Description |
|---|
AbstractListProcessor() |
| Modifier and Type | Method and Description |
|---|---|
protected abstract Map<String,String> |
createAttributes(T entity,
ProcessContext context)
Creates a Map of attributes that should be applied to the FlowFile to represent this entity.
|
private EntityListing |
deserialize(String serializedState) |
protected String |
getKey(String directory) |
protected abstract String |
getPath(ProcessContext context)
Returns the path to perform a listing on.
|
File |
getPersistenceFile() |
Set<Relationship> |
getRelationships() |
protected abstract Scope |
getStateScope(ProcessContext context)
Returns a Scope that specifies where the state should be managed for this Processor
|
protected List<PropertyDescriptor> |
getSupportedPropertyDescriptors() |
protected abstract boolean |
isListingResetNecessary(PropertyDescriptor property)
Determines whether or not the listing must be reset if the value of the given property is changed
|
private void |
migrateState(String path,
DistributedMapCacheClient client,
StateManager stateManager,
Scope scope)
This processor used to use the DistributedMapCacheClient in order to store cluster-wide state, before the introduction of
the StateManager.
|
void |
onPrimaryNodeChange(PrimaryNodeState newState) |
void |
onPropertyModified(PropertyDescriptor descriptor,
String oldValue,
String newValue) |
void |
onTrigger(ProcessContext context,
ProcessSession session) |
protected abstract List<T> |
performListing(ProcessContext context,
Long minTimestamp)
Performs a listing of the remote entities that can be pulled.
|
private void |
persist(long listingTimestamp,
long processedTimestamp,
StateManager stateManager,
Scope scope) |
private void |
resetTimeStates() |
void |
updateState(ProcessContext context) |
onTriggergetControllerServiceLookup, getIdentifier, getLogger, getNodeTypeProvider, init, initialize, isConfigurationRestored, isScheduled, toString, updateConfiguredRestoredTrue, updateScheduledFalse, updateScheduledTruecustomValidate, equals, getPropertyDescriptor, getPropertyDescriptors, getSupportedDynamicPropertyDescriptor, hashCode, validateclone, finalize, getClass, notify, notifyAll, wait, wait, waitgetPropertyDescriptor, getPropertyDescriptors, validatepublic static final PropertyDescriptor DISTRIBUTED_CACHE_SERVICE
public static final Relationship REL_SUCCESS
private volatile Long lastListingTime
private volatile Long lastProcessedTime
private volatile Long lastRunTime
private volatile boolean justElectedPrimaryNode
private volatile boolean resetState
public static final long LISTING_LAG_NANOS
static final String LISTING_TIMESTAMP_KEY
static final String PROCESSED_TIMESTAMP_KEY
public File getPersistenceFile()
protected List<PropertyDescriptor> getSupportedPropertyDescriptors()
getSupportedPropertyDescriptors in class AbstractConfigurableComponentpublic void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue)
onPropertyModified in interface ConfigurableComponentonPropertyModified in class AbstractConfigurableComponentpublic Set<Relationship> getRelationships()
getRelationships in interface ProcessorgetRelationships in class AbstractSessionFactoryProcessor@OnPrimaryNodeStateChange public void onPrimaryNodeChange(PrimaryNodeState newState)
@OnScheduled public final void updateState(ProcessContext context) throws IOException
IOExceptionprivate void migrateState(String path, DistributedMapCacheClient client, StateManager stateManager, Scope scope) throws IOException
path - the path to migrate state forclient - the DistributedMapCacheClient that is capable of obtaining the current statestateManager - the StateManager to use in order to store the new statescope - the scope to useIOException - if unable to retrieve or store the stateprivate void persist(long listingTimestamp,
long processedTimestamp,
StateManager stateManager,
Scope scope)
throws IOException
IOExceptionprivate EntityListing deserialize(String serializedState) throws org.codehaus.jackson.JsonParseException, org.codehaus.jackson.map.JsonMappingException, IOException
org.codehaus.jackson.JsonParseExceptionorg.codehaus.jackson.map.JsonMappingExceptionIOExceptionpublic void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException
onTrigger in class AbstractProcessorProcessExceptionprivate void resetTimeStates()
protected abstract Map<String,String> createAttributes(T entity, ProcessContext context)
entity - the entity represented by the FlowFilecontext - the ProcessContext for obtaining configuration informationprotected abstract String getPath(ProcessContext context)
context - the ProcessContex to use in order to obtain configurationnull if not applicable.protected abstract List<T> performListing(ProcessContext context, Long minTimestamp) throws IOException
context - the ProcessContex to use in order to pull the appropriate entitiesminTimestamp - the minimum timestamp of entities that should be returned.IOExceptionprotected abstract boolean isListingResetNecessary(PropertyDescriptor property)
property - the property that has changedtrue if a change in value of the given property necessitates that the listing be reset, false otherwise.protected abstract Scope getStateScope(ProcessContext context)
context - the ProcessContext to use in order to make a determinationCopyright © 2017 Apache NiFi Project. All rights reserved.