Class FileSystemCollectionReader

java.lang.Object
org.apache.uima.resource.Resource_ImplBase
org.apache.uima.resource.ConfigurableResource_ImplBase
org.apache.uima.collection.CollectionReader_ImplBase
org.apache.uima.tools.components.FileSystemCollectionReader
All Implemented Interfaces:
org.apache.uima.collection.base_cpm.BaseCollectionReader, org.apache.uima.collection.CollectionReader, org.apache.uima.resource.ConfigurableResource, org.apache.uima.resource.Resource

public class FileSystemCollectionReader extends org.apache.uima.collection.CollectionReader_ImplBase
A simple collection reader that reads documents from a directory in the filesystem. It can be configured with the following parameters:
  • InputDirectory - path to directory containing files
  • Encoding (optional) - character encoding of the input files
  • Language (optional) - language of the input documents
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Name of configuration parameter that contains the character encoding used by the input files.
    static final String
    Name of configuration parameter that must be set to the path of a directory containing input files.
    static final String
    Name of optional configuration parameter that contains the language of the documents in the input directory.
    static final String
    Name of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknown
    static final String
    Optional configuration parameter that specifies XCAS input files

    Fields inherited from interface org.apache.uima.resource.Resource

    PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_MANAGER, PARAM_CONFIG_PARAM_SETTINGS, PARAM_EXTERNAL_OVERRIDE_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
     
    static org.apache.uima.collection.CollectionReaderDescription
    Parses and returns the descriptor for this collection reader.
    static URL
     
    void
    getNext(org.apache.uima.cas.CAS aCAS)
     
    int
    Gets the total number of documents that will be returned by this collection reader.
    org.apache.uima.util.Progress[]
     
    boolean
     
    void
     

    Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase

    destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit

    Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase

    getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue

    Methods inherited from class org.apache.uima.resource.Resource_ImplBase

    getCasManager, getLogger, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, loadUserClass, loadUserClassOrThrow, setContextHolder, setContextHolderX, setLogger, setMetaData, withContextHolder

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.uima.resource.ConfigurableResource

    getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue

    Methods inherited from interface org.apache.uima.resource.Resource

    getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
  • Field Details

    • PARAM_INPUTDIR

      public static final String PARAM_INPUTDIR
      Name of configuration parameter that must be set to the path of a directory containing input files.
      See Also:
    • PARAM_ENCODING

      public static final String PARAM_ENCODING
      Name of configuration parameter that contains the character encoding used by the input files. If not specified, the default system encoding will be used.
      See Also:
    • PARAM_LANGUAGE

      public static final String PARAM_LANGUAGE
      Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.
      See Also:
    • PARAM_XCAS

      public static final String PARAM_XCAS
      Optional configuration parameter that specifies XCAS input files
      See Also:
    • PARAM_LENIENT

      public static final String PARAM_LENIENT
      Name of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknown
      See Also:
  • Constructor Details

    • FileSystemCollectionReader

      public FileSystemCollectionReader()
  • Method Details

    • initialize

      public void initialize() throws org.apache.uima.resource.ResourceInitializationException
      Overrides:
      initialize in class org.apache.uima.collection.CollectionReader_ImplBase
      Throws:
      org.apache.uima.resource.ResourceInitializationException
      See Also:
      • CollectionReader_ImplBase.initialize()
    • hasNext

      public boolean hasNext()
      See Also:
      • BaseCollectionReader.hasNext()
    • getNext

      public void getNext(org.apache.uima.cas.CAS aCAS) throws IOException, org.apache.uima.collection.CollectionException
      Throws:
      IOException
      org.apache.uima.collection.CollectionException
      See Also:
      • CollectionReader.getNext(org.apache.uima.cas.CAS)
    • close

      public void close() throws IOException
      Throws:
      IOException
      See Also:
      • BaseCollectionReader.close()
    • getProgress

      public org.apache.uima.util.Progress[] getProgress()
      See Also:
      • BaseCollectionReader.getProgress()
    • getNumberOfDocuments

      public int getNumberOfDocuments()
      Gets the total number of documents that will be returned by this collection reader. This is not part of the general collection reader interface.
      Returns:
      the number of documents in the collection
    • getDescription

      public static org.apache.uima.collection.CollectionReaderDescription getDescription() throws org.apache.uima.util.InvalidXMLException
      Parses and returns the descriptor for this collection reader. The descriptor is stored in the uima.jar file and located using the ClassLoader.
      Returns:
      an object containing all of the information parsed from the descriptor.
      Throws:
      org.apache.uima.util.InvalidXMLException - if the descriptor is invalid or missing
    • getDescriptorURL

      public static URL getDescriptorURL()