Class SolrEmitter

  • All Implemented Interfaces:
    org.apache.tika.config.Initializable, org.apache.tika.pipes.emitter.Emitter

    public class SolrEmitter
    extends org.apache.tika.pipes.emitter.AbstractEmitter
    implements org.apache.tika.config.Initializable
    • Field Detail

      • DEFAULT_EMBEDDED_FILE_FIELD_NAME

        public static String DEFAULT_EMBEDDED_FILE_FIELD_NAME
    • Constructor Detail

      • SolrEmitter

        public SolrEmitter()
                    throws org.apache.tika.exception.TikaConfigException
        Throws:
        org.apache.tika.exception.TikaConfigException
    • Method Detail

      • emit

        public void emit​(String emitKey,
                         List<org.apache.tika.metadata.Metadata> metadataList,
                         org.apache.tika.parser.ParseContext parseContext)
                  throws IOException,
                         org.apache.tika.pipes.emitter.TikaEmitterException
        Specified by:
        emit in interface org.apache.tika.pipes.emitter.Emitter
        Throws:
        IOException
        org.apache.tika.pipes.emitter.TikaEmitterException
      • emit

        public void emit​(List<? extends org.apache.tika.pipes.emitter.EmitData> batch)
                  throws IOException,
                         org.apache.tika.pipes.emitter.TikaEmitterException
        Specified by:
        emit in interface org.apache.tika.pipes.emitter.Emitter
        Overrides:
        emit in class org.apache.tika.pipes.emitter.AbstractEmitter
        Throws:
        IOException
        org.apache.tika.pipes.emitter.TikaEmitterException
      • setAttachmentStrategy

        @Field
        public void setAttachmentStrategy​(String attachmentStrategy)
        Options: SKIP, CONCATENATE_CONTENT, PARENT_CHILD. Default is "PARENT_CHILD". If set to "SKIP", this will index only the main file and ignore all info in the attachments. If set to "CONCATENATE_CONTENT", this will concatenate the content extracted from the attachments into the main document and then index the main document with the concatenated content _and_ the main document's metadata (metadata from attachments will be thrown away). If set to "PARENT_CHILD", this will index the attachments as children of the parent document via Solr's parent-child relationship.
      • setUpdateStrategy

        @Field
        public void setUpdateStrategy​(String updateStrategy)
      • setConnectionTimeout

        @Field
        public void setConnectionTimeout​(int connectionTimeout)
      • setSocketTimeout

        @Field
        public void setSocketTimeout​(int socketTimeout)
      • getCommitWithin

        public int getCommitWithin()
      • setCommitWithin

        @Field
        public void setCommitWithin​(int commitWithin)
      • setIdField

        @Field
        public void setIdField​(String idField)
        Specify the field in the first Metadata that should be used as the id field for the document.
        Parameters:
        idField -
      • setSolrCollection

        @Field
        public void setSolrCollection​(String solrCollection)
      • setSolrUrls

        @Field
        public void setSolrUrls​(List<String> solrUrls)
      • setSolrZkHosts

        @Field
        public void setSolrZkHosts​(List<String> solrZkHosts)
      • setSolrZkChroot

        @Field
        public void setSolrZkChroot​(String solrZkChroot)
      • setUserName

        @Field
        public void setUserName​(String userName)
      • setPassword

        @Field
        public void setPassword​(String password)
      • setAuthScheme

        @Field
        public void setAuthScheme​(String authScheme)
      • setProxyHost

        @Field
        public void setProxyHost​(String proxyHost)
      • setProxyPort

        @Field
        public void setProxyPort​(int proxyPort)
      • setEmbeddedFileFieldName

        @Field
        public void setEmbeddedFileFieldName​(String embeddedFileFieldName)
        If using the SolrEmitter.AttachmentStrategy.PARENT_CHILD, this is the field name used to store the child documents. Note that we artificially flatten all embedded documents, no matter how nested in the container document, into direct children of the root document.
        Parameters:
        embeddedFileFieldName -
      • initialize

        public void initialize​(Map<String,​org.apache.tika.config.Param> params)
                        throws org.apache.tika.exception.TikaConfigException
        Specified by:
        initialize in interface org.apache.tika.config.Initializable
        Throws:
        org.apache.tika.exception.TikaConfigException
      • checkInitialization

        public void checkInitialization​(org.apache.tika.config.InitializableProblemHandler problemHandler)
                                 throws org.apache.tika.exception.TikaConfigException
        Specified by:
        checkInitialization in interface org.apache.tika.config.Initializable
        Throws:
        org.apache.tika.exception.TikaConfigException