org.apache.solr.hadoop.morphline
Class MorphlineMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,org.apache.hadoop.io.Text,SolrInputDocumentWritable>
org.apache.solr.hadoop.SolrMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
org.apache.solr.hadoop.morphline.MorphlineMapper
public class MorphlineMapper
- extends SolrMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
This class takes the input files, extracts the relevant content, transforms
it and hands SolrInputDocuments to a set of reducers.
More specifically, it consumes a list of <offset, hdfsFilePath> input pairs.
For each such pair extracts a set of zero or more SolrInputDocuments and
sends them to a downstream Reducer. The key for the reducer is the unique id
of the SolrInputDocument specified in Solr schema.xml.
| Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper |
org.apache.hadoop.mapreduce.Mapper.Context |
|
Method Summary |
protected void |
cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)
|
protected org.apache.hadoop.mapreduce.Mapper.Context |
getContext()
|
protected org.apache.solr.schema.IndexSchema |
getSchema()
|
void |
map(org.apache.hadoop.io.LongWritable key,
org.apache.hadoop.io.Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
Extract content from the path specified in the value. |
protected void |
setup(org.apache.hadoop.mapreduce.Mapper.Context context)
|
| Methods inherited from class org.apache.hadoop.mapreduce.Mapper |
run |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MorphlineMapper
public MorphlineMapper()
getSchema
protected org.apache.solr.schema.IndexSchema getSchema()
getContext
protected org.apache.hadoop.mapreduce.Mapper.Context getContext()
setup
protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
- Overrides:
setup in class SolrMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Throws:
IOException
InterruptedException
map
public void map(org.apache.hadoop.io.LongWritable key,
org.apache.hadoop.io.Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
- Extract content from the path specified in the value. Key is useless.
- Overrides:
map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,SolrInputDocumentWritable>
- Throws:
IOException
InterruptedException
cleanup
protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException,
InterruptedException
- Overrides:
cleanup in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,SolrInputDocumentWritable>
- Throws:
IOException
InterruptedException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.