public class Wiki727Reader extends RawTextDatasetReader
| Modifier and Type | Field and Description |
|---|---|
protected static org.slf4j.Logger |
log |
protected Pattern |
SECTION_PATTERN |
protected int |
sectionLevel |
protected boolean |
skipPrefaceAnnotation |
protected boolean |
skipPrefaceText |
generateUIDs, isTokenized, useFirstSentenceAsTitlelimit, randomizeDocuments| Constructor and Description |
|---|
Wiki727Reader() |
| Modifier and Type | Method and Description |
|---|---|
Document |
readDocumentFromFile(Resource file)
Read a single Document from file.
|
Wiki727Reader |
withSectionLevel(int level)
Create Annotations down to a given level.
0 - include all section
1 - the whole document as one section
2 - include sections
3 - include subsections
etc.
|
Wiki727Reader |
withSkipPreface(boolean skip) |
withFirstSentenceAsTitle, withGeneratedUIDs, withTokenizedInputread, readDatasetFromDirectory, readDatasetFromDirectory, stream, streamDocumentsFromDirectory, tryReadDocumentsFromFile, withLimitNumberOfDocuments, withRandomizedDocumentsprotected static final org.slf4j.Logger log
protected int sectionLevel
protected boolean skipPrefaceText
protected boolean skipPrefaceAnnotation
protected Pattern SECTION_PATTERN
public Wiki727Reader withSectionLevel(int level)
public Wiki727Reader withSkipPreface(boolean skip)
public Document readDocumentFromFile(Resource file)
readDocumentFromFile in class RawTextDatasetReaderCopyright © 2019. All rights reserved.