org.apache.solr.hadoop.dedup
Class RetainMostRecentUpdateConflictResolver
java.lang.Object
org.apache.solr.hadoop.dedup.RetainMostRecentUpdateConflictResolver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, UpdateConflictResolver
public class RetainMostRecentUpdateConflictResolver
- extends Object
- implements UpdateConflictResolver, org.apache.hadoop.conf.Configurable
UpdateConflictResolver implementation that ignores all but the most recent
document version, based on a configurable numeric Solr field, which defaults
to the file_last_modified timestamp.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ORDER_BY_FIELD_NAME_KEY
public static final String ORDER_BY_FIELD_NAME_KEY
ORDER_BY_FIELD_NAME_DEFAULT
public static final String ORDER_BY_FIELD_NAME_DEFAULT
- See Also:
- Constant Field Values
COUNTER_GROUP
public static final String COUNTER_GROUP
DUPLICATES_COUNTER_NAME
public static final String DUPLICATES_COUNTER_NAME
- See Also:
- Constant Field Values
OUTDATED_COUNTER_NAME
public static final String OUTDATED_COUNTER_NAME
- See Also:
- Constant Field Values
RetainMostRecentUpdateConflictResolver
public RetainMostRecentUpdateConflictResolver()
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
- Specified by:
setConf in interface org.apache.hadoop.conf.Configurable
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf in interface org.apache.hadoop.conf.Configurable
getOrderByFieldName
protected String getOrderByFieldName()
orderUpdates
public Iterator<SolrInputDocument> orderUpdates(org.apache.hadoop.io.Text key,
Iterator<SolrInputDocument> updates,
org.apache.hadoop.mapreduce.Reducer.Context ctx)
- Description copied from interface:
UpdateConflictResolver
- Given a list of all colliding document updates for the same unique document
key, this method returns zero or more documents in an application specific
order.
The caller will then apply the updates for this key to Solr in the order
returned by the orderUpdate() method.
- Specified by:
orderUpdates in interface UpdateConflictResolver
- Parameters:
key - the document key common to all collidingUpdates mentioned belowupdates - all updates in the MapReduce job that have a key equal to
uniqueKey mentioned above. The input order is unspecified.ctx - The Context passed from the Reducer
implementations.
- Returns:
- the order in which the updates shall be applied to Solr
getMaximum
protected Iterator<SolrInputDocument> getMaximum(Iterator<SolrInputDocument> updates,
String fieldName,
Comparator child,
org.apache.hadoop.mapreduce.Reducer.Context context)
- Returns the most recent document among the colliding updates
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.