org.apache.solr.hadoop.dedup
Interface UpdateConflictResolver

All Known Implementing Classes:
NoChangeUpdateConflictResolver, RejectingUpdateConflictResolver, RetainMostRecentUpdateConflictResolver, SortingUpdateConflictResolver

public interface UpdateConflictResolver

Interface that enables deduplication and ordering of a series of document updates for the same unique document key. For example, a MapReduce batch job might index multiple files in the same job where some of the files contain old and new versions of the very same document, using the same unique document key. Typically, implementations of this interface forbid collisions by throwing an exception, or ignore all but the most recent document version, or, in the general case, order colliding updates ascending from least recent to most recent (partial) update. The caller of this interface (i.e. the Hadoop Reducer) will then apply the updates to Solr in the order returned by the orderUpdates() method. Configuration: If an UpdateConflictResolver implementation also implements Configurable then the Hadoop Reducer will call Configurable.setConf(org.apache.hadoop.conf.Configuration) on instance construction and pass the standard Hadoop configuration information.


Method Summary
 Iterator<SolrInputDocument> orderUpdates(org.apache.hadoop.io.Text uniqueKey, Iterator<SolrInputDocument> collidingUpdates, org.apache.hadoop.mapreduce.Reducer.Context context)
          Given a list of all colliding document updates for the same unique document key, this method returns zero or more documents in an application specific order.
 

Method Detail

orderUpdates

Iterator<SolrInputDocument> orderUpdates(org.apache.hadoop.io.Text uniqueKey,
                                         Iterator<SolrInputDocument> collidingUpdates,
                                         org.apache.hadoop.mapreduce.Reducer.Context context)
Given a list of all colliding document updates for the same unique document key, this method returns zero or more documents in an application specific order. The caller will then apply the updates for this key to Solr in the order returned by the orderUpdate() method.

Parameters:
uniqueKey - the document key common to all collidingUpdates mentioned below
collidingUpdates - all updates in the MapReduce job that have a key equal to uniqueKey mentioned above. The input order is unspecified.
context - The Context passed from the Reducer implementations.
Returns:
the order in which the updates shall be applied to Solr


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.