| Package | Description |
|---|---|
| de.jungblut.nlp | |
| de.jungblut.nlp.mr |
| Modifier and Type | Class and Description |
|---|---|
class |
BigramTokenizer
Advanced tokenizer that lowercases, adds start and end tags, deduplicates
tokens and builds bigrams.
|
class |
StandardTokenizer
Just a basic tokenizer by certain attributes with normalization.
|
| Modifier and Type | Method and Description |
|---|---|
static Tokenizer |
WordCorpusFrequencyJob.getTokenizer(org.apache.hadoop.conf.Configuration conf)
Gets a tokenizer, based on the configured class in "tokenizer.class".
|
Copyright © 2016. All rights reserved.