Class SolrEmitter
- java.lang.Object
-
- org.apache.tika.pipes.emitter.AbstractEmitter
-
- org.apache.tika.pipes.emitter.solr.SolrEmitter
-
- All Implemented Interfaces:
org.apache.tika.config.Initializable,org.apache.tika.pipes.emitter.Emitter
public class SolrEmitter extends org.apache.tika.pipes.emitter.AbstractEmitter implements org.apache.tika.config.Initializable
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSolrEmitter.AttachmentStrategystatic classSolrEmitter.UpdateStrategy
-
Field Summary
Fields Modifier and Type Field Description static StringDEFAULT_EMBEDDED_FILE_FIELD_NAME
-
Constructor Summary
Constructors Constructor Description SolrEmitter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcheckInitialization(org.apache.tika.config.InitializableProblemHandler problemHandler)voidemit(String emitKey, List<org.apache.tika.metadata.Metadata> metadataList, org.apache.tika.parser.ParseContext parseContext)voidemit(List<? extends org.apache.tika.pipes.emitter.EmitData> batch)intgetCommitWithin()voidinitialize(Map<String,org.apache.tika.config.Param> params)voidsetAttachmentStrategy(String attachmentStrategy)Options: SKIP, CONCATENATE_CONTENT, PARENT_CHILD.voidsetAuthScheme(String authScheme)voidsetCommitWithin(int commitWithin)voidsetConnectionTimeout(int connectionTimeout)voidsetEmbeddedFileFieldName(String embeddedFileFieldName)If using theSolrEmitter.AttachmentStrategy.PARENT_CHILD, this is the field name used to store the child documents.voidsetIdField(String idField)Specify the field in the first Metadata that should be used as the id field for the document.voidsetPassword(String password)voidsetProxyHost(String proxyHost)voidsetProxyPort(int proxyPort)voidsetSocketTimeout(int socketTimeout)voidsetSolrCollection(String solrCollection)voidsetSolrUrls(List<String> solrUrls)voidsetSolrZkChroot(String solrZkChroot)voidsetSolrZkHosts(List<String> solrZkHosts)voidsetUpdateStrategy(String updateStrategy)voidsetUserName(String userName)
-
-
-
Field Detail
-
DEFAULT_EMBEDDED_FILE_FIELD_NAME
public static String DEFAULT_EMBEDDED_FILE_FIELD_NAME
-
-
Method Detail
-
emit
public void emit(String emitKey, List<org.apache.tika.metadata.Metadata> metadataList, org.apache.tika.parser.ParseContext parseContext) throws IOException, org.apache.tika.pipes.emitter.TikaEmitterException
- Specified by:
emitin interfaceorg.apache.tika.pipes.emitter.Emitter- Throws:
IOExceptionorg.apache.tika.pipes.emitter.TikaEmitterException
-
emit
public void emit(List<? extends org.apache.tika.pipes.emitter.EmitData> batch) throws IOException, org.apache.tika.pipes.emitter.TikaEmitterException
- Specified by:
emitin interfaceorg.apache.tika.pipes.emitter.Emitter- Overrides:
emitin classorg.apache.tika.pipes.emitter.AbstractEmitter- Throws:
IOExceptionorg.apache.tika.pipes.emitter.TikaEmitterException
-
setAttachmentStrategy
@Field public void setAttachmentStrategy(String attachmentStrategy)
Options: SKIP, CONCATENATE_CONTENT, PARENT_CHILD. Default is "PARENT_CHILD". If set to "SKIP", this will index only the main file and ignore all info in the attachments. If set to "CONCATENATE_CONTENT", this will concatenate the content extracted from the attachments into the main document and then index the main document with the concatenated content _and_ the main document's metadata (metadata from attachments will be thrown away). If set to "PARENT_CHILD", this will index the attachments as children of the parent document via Solr's parent-child relationship.
-
setUpdateStrategy
@Field public void setUpdateStrategy(String updateStrategy)
-
setConnectionTimeout
@Field public void setConnectionTimeout(int connectionTimeout)
-
setSocketTimeout
@Field public void setSocketTimeout(int socketTimeout)
-
getCommitWithin
public int getCommitWithin()
-
setCommitWithin
@Field public void setCommitWithin(int commitWithin)
-
setIdField
@Field public void setIdField(String idField)
Specify the field in the first Metadata that should be used as the id field for the document.- Parameters:
idField-
-
setSolrCollection
@Field public void setSolrCollection(String solrCollection)
-
setSolrZkChroot
@Field public void setSolrZkChroot(String solrZkChroot)
-
setUserName
@Field public void setUserName(String userName)
-
setPassword
@Field public void setPassword(String password)
-
setAuthScheme
@Field public void setAuthScheme(String authScheme)
-
setProxyHost
@Field public void setProxyHost(String proxyHost)
-
setProxyPort
@Field public void setProxyPort(int proxyPort)
-
setEmbeddedFileFieldName
@Field public void setEmbeddedFileFieldName(String embeddedFileFieldName)
If using theSolrEmitter.AttachmentStrategy.PARENT_CHILD, this is the field name used to store the child documents. Note that we artificially flatten all embedded documents, no matter how nested in the container document, into direct children of the root document.- Parameters:
embeddedFileFieldName-
-
initialize
public void initialize(Map<String,org.apache.tika.config.Param> params) throws org.apache.tika.exception.TikaConfigException
- Specified by:
initializein interfaceorg.apache.tika.config.Initializable- Throws:
org.apache.tika.exception.TikaConfigException
-
checkInitialization
public void checkInitialization(org.apache.tika.config.InitializableProblemHandler problemHandler) throws org.apache.tika.exception.TikaConfigException- Specified by:
checkInitializationin interfaceorg.apache.tika.config.Initializable- Throws:
org.apache.tika.exception.TikaConfigException
-
-