public class LeipzigDoccatSampleStream extends FilterObjectStream<String,DocumentSample>
The input text is tokenized with the SimpleTokenizer. The input text classified
by the language model must also be tokenized by the SimpleTokenizer to produce
exactly the same tokenization during testing and training.ΓΈ
| Constructor and Description |
|---|
LeipzigDoccatSampleStream(String language,
int sentencesPerDocument,
InputStreamFactory in)
Creates a new LeipzigDoccatSampleStream with the specified parameters.
|
LeipzigDoccatSampleStream(String language,
int sentencesPerDocument,
Tokenizer tokenizer,
InputStreamFactory in)
Creates a new LeipzigDoccatSampleStream with the specified parameters.
|
| Modifier and Type | Method and Description |
|---|---|
DocumentSample |
read()
Returns the next object.
|
close, resetpublic LeipzigDoccatSampleStream(String language, int sentencesPerDocument, Tokenizer tokenizer, InputStreamFactory in) throws IOException
language - the Leipzig input sentences.txt filesentencesPerDocument - the number of sentences which
should be grouped into once DocumentSamplein - the InputStream pointing to the contents of the sentences.txt input fileIOException - IOExceptionpublic LeipzigDoccatSampleStream(String language, int sentencesPerDocument, InputStreamFactory in) throws IOException
language - the Leipzig input sentences.txt filesentencesPerDocument - the number of sentences which should be
grouped into once DocumentSamplein - the InputStream pointing to the contents of the sentences.txt input fileIOException - IOExceptionpublic DocumentSample read() throws IOException
ObjectStreamIOException - if there is an error during readingCopyright © 2017 The Apache Software Foundation. All rights reserved.