Package opennlp.tools.formats.ad
Class ADNameSampleStream
- java.lang.Object
-
- opennlp.tools.formats.ad.ADNameSampleStream
-
- All Implemented Interfaces:
AutoCloseable,ObjectStream<NameSample>
@Internal public class ADNameSampleStream extends Object implements ObjectStream<NameSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.The data contains four named entity types: Person, Organization, Group, Place, Event, ArtProd, Abstract, Thing, Time and Numeric.
Data can be found on this web site.
Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
12 de Fevereiro de 2006.Detailed info about the NER tagset.
Note: Do not use this class, internal use only!
-
-
Constructor Summary
Constructors Constructor Description ADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens)Deprecated.ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens)Initializes a newADNameSampleStreamstream from aObjectStream, that could be aPlainTextByLineStreamobject.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes theObjectStreamand releases all allocated resources.NameSampleread()Returns the nextObjectStreamobject.voidreset()Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.
-
-
-
Constructor Detail
-
ADNameSampleStream
public ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens)
Initializes a newADNameSampleStreamstream from aObjectStream, that could be aPlainTextByLineStreamobject.- Parameters:
lineStream- AnObjectStreamas input.splitHyphenatedTokens- Iftruehyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
-
ADNameSampleStream
@Deprecated public ADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) throws IOException
Deprecated.Initializes a newADNameSampleStreamfrom anInputStreamFactory- Parameters:
in- The CorpusInputStreamFactory.charsetName- Thecharsetto use for reading of the corpus.splitHyphenatedTokens- Iftruehyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".- Throws:
IOException
-
-
Method Detail
-
read
public NameSample read() throws IOException
Description copied from interface:ObjectStreamReturns the nextObjectStreamobject. Calling this method repeatedly until it returnsnullwill return each object from the underlying source exactly once.- Specified by:
readin interfaceObjectStream<NameSample>- Returns:
- The next object or
nullto signal that the stream is exhausted. - Throws:
IOException- Thrown if there is an error during reading.
-
reset
public void reset() throws IOException, UnsupportedOperationExceptionDescription copied from interface:ObjectStreamRepositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required.The implementation of this method is optional.
- Specified by:
resetin interfaceObjectStream<NameSample>- Throws:
IOException- Thrown if there is an error during resetting the stream.UnsupportedOperationException- Thrown if thereset()is not supported. By default, this is the case.
-
close
public void close() throws IOExceptionDescription copied from interface:ObjectStreamCloses theObjectStreamand releases all allocated resources. After close was called, it's not allowed to callObjectStream.read()orObjectStream.reset().- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceObjectStream<NameSample>- Throws:
IOException- Thrown if there is an error during closing the stream.
-
-