Package org.apache.uima.tools.components
Class FileSystemCollectionReader
java.lang.Object
org.apache.uima.resource.Resource_ImplBase
org.apache.uima.resource.ConfigurableResource_ImplBase
org.apache.uima.collection.CollectionReader_ImplBase
org.apache.uima.tools.components.FileSystemCollectionReader
- All Implemented Interfaces:
org.apache.uima.collection.base_cpm.BaseCollectionReader,org.apache.uima.collection.CollectionReader,org.apache.uima.resource.ConfigurableResource,org.apache.uima.resource.Resource
public class FileSystemCollectionReader
extends org.apache.uima.collection.CollectionReader_ImplBase
A simple collection reader that reads documents from a directory in the filesystem. It can be
configured with the following parameters:
InputDirectory- path to directory containing filesEncoding(optional) - character encoding of the input filesLanguage(optional) - language of the input documents
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringName of configuration parameter that contains the character encoding used by the input files.static final StringName of configuration parameter that must be set to the path of a directory containing input files.static final StringName of optional configuration parameter that contains the language of the documents in the input directory.static final StringName of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknownstatic final StringOptional configuration parameter that specifies XCAS input filesFields inherited from interface org.apache.uima.resource.Resource
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_MANAGER, PARAM_CONFIG_PARAM_SETTINGS, PARAM_EXTERNAL_OVERRIDE_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()static org.apache.uima.collection.CollectionReaderDescriptionParses and returns the descriptor for this collection reader.static URLvoidgetNext(org.apache.uima.cas.CAS aCAS) intGets the total number of documents that will be returned by this collection reader.org.apache.uima.util.Progress[]booleanhasNext()voidMethods inherited from class org.apache.uima.collection.CollectionReader_ImplBase
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInitMethods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValueMethods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, loadUserClass, loadUserClassOrThrow, setContextHolder, setContextHolderX, setLogger, setMetaData, withContextHolderMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.uima.resource.ConfigurableResource
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValueMethods inherited from interface org.apache.uima.resource.Resource
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
-
Field Details
-
PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input files.- See Also:
-
PARAM_ENCODING
Name of configuration parameter that contains the character encoding used by the input files. If not specified, the default system encoding will be used.- See Also:
-
PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.- See Also:
-
PARAM_XCAS
Optional configuration parameter that specifies XCAS input files- See Also:
-
PARAM_LENIENT
Name of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknown- See Also:
-
-
Constructor Details
-
FileSystemCollectionReader
public FileSystemCollectionReader()
-
-
Method Details
-
initialize
public void initialize() throws org.apache.uima.resource.ResourceInitializationException- Overrides:
initializein classorg.apache.uima.collection.CollectionReader_ImplBase- Throws:
org.apache.uima.resource.ResourceInitializationException- See Also:
-
hasNext
public boolean hasNext()- See Also:
-
getNext
public void getNext(org.apache.uima.cas.CAS aCAS) throws IOException, org.apache.uima.collection.CollectionException - Throws:
IOExceptionorg.apache.uima.collection.CollectionException- See Also:
-
close
- Throws:
IOException- See Also:
-
getProgress
public org.apache.uima.util.Progress[] getProgress()- See Also:
-
getNumberOfDocuments
public int getNumberOfDocuments()Gets the total number of documents that will be returned by this collection reader. This is not part of the general collection reader interface.- Returns:
- the number of documents in the collection
-
getDescription
public static org.apache.uima.collection.CollectionReaderDescription getDescription() throws org.apache.uima.util.InvalidXMLExceptionParses and returns the descriptor for this collection reader. The descriptor is stored in the uima.jar file and located using the ClassLoader.- Returns:
- an object containing all of the information parsed from the descriptor.
- Throws:
org.apache.uima.util.InvalidXMLException- if the descriptor is invalid or missing
-
getDescriptorURL
-