org.apache.solr.hadoop.dedup
Class RetainMostRecentUpdateConflictResolver

java.lang.Object
  extended by org.apache.solr.hadoop.dedup.RetainMostRecentUpdateConflictResolver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, UpdateConflictResolver

public class RetainMostRecentUpdateConflictResolver
extends Object
implements UpdateConflictResolver, org.apache.hadoop.conf.Configurable

UpdateConflictResolver implementation that ignores all but the most recent document version, based on a configurable numeric Solr field, which defaults to the file_last_modified timestamp.


Field Summary
static String COUNTER_GROUP
           
static String DUPLICATES_COUNTER_NAME
           
static String ORDER_BY_FIELD_NAME_DEFAULT
           
static String ORDER_BY_FIELD_NAME_KEY
           
static String OUTDATED_COUNTER_NAME
           
 
Constructor Summary
RetainMostRecentUpdateConflictResolver()
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConf()
           
protected  Iterator<SolrInputDocument> getMaximum(Iterator<SolrInputDocument> updates, String fieldName, Comparator child, org.apache.hadoop.mapreduce.Reducer.Context context)
          Returns the most recent document among the colliding updates
protected  String getOrderByFieldName()
           
 Iterator<SolrInputDocument> orderUpdates(org.apache.hadoop.io.Text key, Iterator<SolrInputDocument> updates, org.apache.hadoop.mapreduce.Reducer.Context ctx)
          Given a list of all colliding document updates for the same unique document key, this method returns zero or more documents in an application specific order.
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ORDER_BY_FIELD_NAME_KEY

public static final String ORDER_BY_FIELD_NAME_KEY

ORDER_BY_FIELD_NAME_DEFAULT

public static final String ORDER_BY_FIELD_NAME_DEFAULT
See Also:
Constant Field Values

COUNTER_GROUP

public static final String COUNTER_GROUP

DUPLICATES_COUNTER_NAME

public static final String DUPLICATES_COUNTER_NAME
See Also:
Constant Field Values

OUTDATED_COUNTER_NAME

public static final String OUTDATED_COUNTER_NAME
See Also:
Constant Field Values
Constructor Detail

RetainMostRecentUpdateConflictResolver

public RetainMostRecentUpdateConflictResolver()
Method Detail

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

getOrderByFieldName

protected String getOrderByFieldName()

orderUpdates

public Iterator<SolrInputDocument> orderUpdates(org.apache.hadoop.io.Text key,
                                                Iterator<SolrInputDocument> updates,
                                                org.apache.hadoop.mapreduce.Reducer.Context ctx)
Description copied from interface: UpdateConflictResolver
Given a list of all colliding document updates for the same unique document key, this method returns zero or more documents in an application specific order. The caller will then apply the updates for this key to Solr in the order returned by the orderUpdate() method.

Specified by:
orderUpdates in interface UpdateConflictResolver
Parameters:
key - the document key common to all collidingUpdates mentioned below
updates - all updates in the MapReduce job that have a key equal to uniqueKey mentioned above. The input order is unspecified.
ctx - The Context passed from the Reducer implementations.
Returns:
the order in which the updates shall be applied to Solr

getMaximum

protected Iterator<SolrInputDocument> getMaximum(Iterator<SolrInputDocument> updates,
                                                 String fieldName,
                                                 Comparator child,
                                                 org.apache.hadoop.mapreduce.Reducer.Context context)
Returns the most recent document among the colliding updates



Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.