Class CSVPipesIterator

  • All Implemented Interfaces:
    Iterable<org.apache.tika.pipes.FetchEmitTuple>, Callable<Integer>, org.apache.tika.config.Initializable

    public class CSVPipesIterator
    extends org.apache.tika.pipes.pipesiterator.PipesIterator
    implements org.apache.tika.config.Initializable
    Iterates through a UTF-8 CSV file. This adds all columns (except for the 'fetchKeyColumn' and 'emitKeyColumn', if specified) to the metadata object.

    • If an 'idColumn' is specified, this will use that column's value as the id.
    • If no 'idColumn' is specified, but a 'fetchKeyColumn' is specified, the string in the 'fetchKeyColumn' will be used as the 'id'.
    • The 'idColumn' value is not added to the metadata.
    • If a 'fetchKeyColumn' is specified, this will use that column's value as the fetchKey.
    • If no 'fetchKeyColumn' is specified, this will send the metadata from the other columns.
    • The 'fetchKeyColumn' value is not added to the metadata.

    • If an 'emitKeyColumn' is specified, this will use that column's value as the emit key.
    • If an 'emitKeyColumn' is not specified, this will use the value from the 'fetchKeyColumn'.
    • The 'emitKeyColumn' value is not added to the metadata.
    • Field Summary

      • Fields inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator

        COMPLETED_SEMAPHORE, DEFAULT_MAX_WAIT_MS, DEFAULT_QUEUE_SIZE
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void checkInitialization​(org.apache.tika.config.InitializableProblemHandler problemHandler)  
      protected void enqueue()  
      void setCsvPath​(String csvPath)  
      void setCsvPath​(Path csvPath)  
      void setEmitKeyColumn​(String emitKeyColumn)  
      void setFetchKeyColumn​(String fetchKeyColumn)  
      void setIdColumn​(String idColumn)  
      • Methods inherited from class org.apache.tika.pipes.pipesiterator.PipesIterator

        build, call, getEmitterName, getFetcherName, getHandlerConfig, getOnParseException, initialize, iterator, setEmitterName, setFetcherName, setHandlerType, setMaxEmbeddedResources, setMaxWaitMs, setOnParseException, setOnParseException, setParseMode, setParseMode, setQueueSize, setThrowOnWriteLimitReached, setWriteLimit, tryToAdd
      • Methods inherited from class org.apache.tika.config.ConfigBase

        buildComposite, buildComposite, buildSingle, buildSingle, configure, handleSettings
      • Methods inherited from interface org.apache.tika.config.Initializable

        initialize
    • Constructor Detail

      • CSVPipesIterator

        public CSVPipesIterator()
    • Method Detail

      • setCsvPath

        @Field
        public void setCsvPath​(String csvPath)
      • setFetchKeyColumn

        @Field
        public void setFetchKeyColumn​(String fetchKeyColumn)
      • setEmitKeyColumn

        @Field
        public void setEmitKeyColumn​(String emitKeyColumn)
      • setIdColumn

        @Field
        public void setIdColumn​(String idColumn)
      • setCsvPath

        @Field
        public void setCsvPath​(Path csvPath)
      • checkInitialization

        public void checkInitialization​(org.apache.tika.config.InitializableProblemHandler problemHandler)
                                 throws org.apache.tika.exception.TikaConfigException
        Specified by:
        checkInitialization in interface org.apache.tika.config.Initializable
        Overrides:
        checkInitialization in class org.apache.tika.pipes.pipesiterator.PipesIterator
        Throws:
        org.apache.tika.exception.TikaConfigException