org.apache.solr.hadoop.morphline
Class MorphlineMapper

java.lang.Object
  extended by org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,org.apache.hadoop.io.Text,SolrInputDocumentWritable>
      extended by org.apache.solr.hadoop.SolrMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
          extended by org.apache.solr.hadoop.morphline.MorphlineMapper

public class MorphlineMapper
extends SolrMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>

This class takes the input files, extracts the relevant content, transforms it and hands SolrInputDocuments to a set of reducers. More specifically, it consumes a list of <offset, hdfsFilePath> input pairs. For each such pair extracts a set of zero or more SolrInputDocuments and sends them to a downstream Reducer. The key for the reducer is the unique id of the SolrInputDocument specified in Solr schema.xml.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context
 
Constructor Summary
MorphlineMapper()
           
 
Method Summary
protected  void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)
           
protected  org.apache.hadoop.mapreduce.Mapper.Context getContext()
           
protected  org.apache.solr.schema.IndexSchema getSchema()
           
 void map(org.apache.hadoop.io.LongWritable key, org.apache.hadoop.io.Text value, org.apache.hadoop.mapreduce.Mapper.Context context)
          Extract content from the path specified in the value.
protected  void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
           
 
Methods inherited from class org.apache.solr.hadoop.SolrMapper
getSolrHomeDir
 
Methods inherited from class org.apache.hadoop.mapreduce.Mapper
run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MorphlineMapper

public MorphlineMapper()
Method Detail

getSchema

protected org.apache.solr.schema.IndexSchema getSchema()

getContext

protected org.apache.hadoop.mapreduce.Mapper.Context getContext()

setup

protected void setup(org.apache.hadoop.mapreduce.Mapper.Context context)
              throws IOException,
                     InterruptedException
Overrides:
setup in class SolrMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException
InterruptedException

map

public void map(org.apache.hadoop.io.LongWritable key,
                org.apache.hadoop.io.Text value,
                org.apache.hadoop.mapreduce.Mapper.Context context)
         throws IOException,
                InterruptedException
Extract content from the path specified in the value. Key is useless.

Overrides:
map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,SolrInputDocumentWritable>
Throws:
IOException
InterruptedException

cleanup

protected void cleanup(org.apache.hadoop.mapreduce.Mapper.Context context)
                throws IOException,
                       InterruptedException
Overrides:
cleanup in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,SolrInputDocumentWritable>
Throws:
IOException
InterruptedException


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.