H - the type of handler to which this corpus sends eventspublic class DiskCorpus<H extends Handler> extends Corpus<H>
DiskCorpus reads data from a specified training and
test directory using a specified parser.
The disk corpus parses the data on-the-fly from disk rather than reading it into memory.
The directories holding training and test data are visited recursively. An GZIP files will be uncompressed and any Zip archives visited recursively.
| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_TEST_DIR_NAME
The name of the default testing directory,
"test". |
static String |
DEFAULT_TRAIN_DIR_NAME
The name of the default training directory,
"train". |
| Constructor and Description |
|---|
DiskCorpus(Parser<H> parser,
File dir)
Construct a corpus from the specified parser and data
directory.
|
DiskCorpus(Parser<H> parser,
File trainDir,
File testDir)
Construct a corpus from the specified parser and training and
test directories.
|
| Modifier and Type | Method and Description |
|---|---|
String |
getCharEncoding()
Returns the current character encoding, or
null
if none has been specified. |
String |
getSystemId()
Return the system identifier for this corpus or
null
if none has been specified. |
Parser<H> |
parser()
Returns the data parser for this corpus.
|
void |
setCharEncoding(String encoding)
Sets the character encoding for this corpus.
|
void |
setSystemId(String systemId)
Sets the system identifier for the corpus.
|
void |
visitTest(H handler)
Visit the testing data, sending extracted events to the
specified handler.
|
void |
visitTrain(H handler)
Visit the training data, sending extracted events to
the specified handler.
|
visitCorpus, visitCorpuspublic static final String DEFAULT_TRAIN_DIR_NAME
"train".public static final String DEFAULT_TEST_DIR_NAME
"test".public DiskCorpus(Parser<H> parser, File dir)
"train" (see DEFAULT_TRAIN_DIR_NAME). The testing data is read from
"test" (see DEFAULT_TEST_DIR_NAME). See DiskCorpus(Parser,File,File) for more information.parser - Parser for the data.dir - Directory in which to find the data.public DiskCorpus(Parser<H> parser, File trainDir, File testDir)
null, the corresponding visit method
will not produce any events.parser - Parser for the data.trainDir - Directory of training data.testDir - Directory of testing data.public void setCharEncoding(String encoding)
encoding - Character encoding.public String getCharEncoding()
null
if none has been specified.public void setSystemId(String systemId)
systemId - System identifier.public String getSystemId()
null
if none has been specified. See setSystemId(String) for
more information.public Parser<H> parser()
public void visitTrain(H handler) throws IOException
visitTrain in class Corpus<H extends Handler>handler - Handler to receive training events.IOException - If there is an underlying I/O error.public void visitTest(H handler) throws IOException
visitTest in class Corpus<H extends Handler>handler - Handler to receive testing events.IOException - If there is an underlying I/O error.Copyright © 2016 Alias-i, Inc.. All rights reserved.