public class CoNLLDatasetReader extends Object implements DatasetReader
| Modifier and Type | Class and Description |
|---|---|
static class |
CoNLLDatasetReader.Charset |
| Modifier and Type | Field and Description |
|---|---|
protected Annotation.Source |
annotationSource |
protected String |
name |
protected int |
tagIndex |
protected String |
type |
protected boolean |
useFirstSentenceAsTitle |
| Constructor and Description |
|---|
CoNLLDatasetReader() |
| Modifier and Type | Method and Description |
|---|---|
protected BIO2Tag |
createTag(String label,
String prevType)
Returns the NER Label based on current and previous tags
|
protected Token |
createTokenFromLine(String line,
int cursor,
String prevType)
Creates Token from the given line of CoNLL2003 data
|
Dataset |
read(Resource path)
Read a Dataset from CoNLL file
|
Dataset |
read(Resource path,
CoNLLDatasetReader.Charset charset)
Read a Dataset from CoNLL file
|
static Dataset |
readDataset(Resource path,
String name,
CoNLLDatasetReader.Charset charset)
Read a Dataset from CoNLL file (static version with default reader)
|
protected Dataset |
readLines(Iterator<String> lines)
Create a Document from the given data
|
CoNLLDatasetReader |
withAnnotationSource(Annotation.Source annotationSource) |
CoNLLDatasetReader |
withFirstSentenceAsTitle(boolean useFirstSentence)
Use a copy of every first sentence as Document title.
|
CoNLLDatasetReader |
withGenericType(String type) |
CoNLLDatasetReader |
withName(String name)
Use a specific name for the Dataset.
|
CoNLLDatasetReader |
withTagIndex(int tagIndex)
Use a specific column as NER tag.
|
protected boolean useFirstSentenceAsTitle
protected Annotation.Source annotationSource
protected int tagIndex
protected String type
protected String name
public CoNLLDatasetReader withName(String name)
public CoNLLDatasetReader withTagIndex(int tagIndex)
tagIndex - Use this index, starting from 0. Default: last column.public CoNLLDatasetReader withFirstSentenceAsTitle(boolean useFirstSentence)
public CoNLLDatasetReader withAnnotationSource(Annotation.Source annotationSource)
annotationSource - Assign this source to all Annotations read from the file.public CoNLLDatasetReader withGenericType(String type)
type - Use this type instead of the given ones in the dataset,public Dataset read(Resource path) throws IOException
read in interface DatasetReaderIOExceptionpublic Dataset read(Resource path, CoNLLDatasetReader.Charset charset) throws IOException
IOExceptionpublic static Dataset readDataset(Resource path, String name, CoNLLDatasetReader.Charset charset) throws IOException
IOExceptionprotected Dataset readLines(Iterator<String> lines)
protected Token createTokenFromLine(String line, int cursor, String prevType)
line - - CoNLL2003 data to create Tokencursor - - character index in the whole documentCopyright © 2020. All rights reserved.