|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectopennlp.tools.formats.ad.ADChunkSampleStream
public class ADChunkSampleStream
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.
The heuristic to extract chunks where based o paper 'A Machine Learning
Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero
Santos and Ruy Milidiú).
Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica"
.
12 de Fevereiro de 2006.
http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf
Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names
Note: Do not use this class, internal use only!
| Constructor Summary | |
|---|---|
ADChunkSampleStream(java.io.InputStream in,
java.lang.String charsetName)
Creates a new NameSample stream from a InputStream |
|
ADChunkSampleStream(ObjectStream<java.lang.String> lineStream)
Creates a new NameSample stream from a line stream, i.e. |
|
| Method Summary | |
|---|---|
void |
close()
Closes the ObjectStream and releases all allocated
resources. |
ChunkSample |
read()
Returns the next object. |
void |
reset()
Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly. |
void |
setEnd(int aEnd)
|
void |
setStart(int aStart)
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public ADChunkSampleStream(ObjectStream<java.lang.String> lineStream)
NameSample stream from a line stream, i.e.
ObjectStream< String>, that could be a
PlainTextByLineStream object.
lineStream - a stream of lines as String
public ADChunkSampleStream(java.io.InputStream in,
java.lang.String charsetName)
NameSample stream from a InputStream
in - the Corpus InputStreamcharsetName - the charset of the Arvores Deitadas Corpus| Method Detail |
|---|
public ChunkSample read()
throws java.io.IOException
ObjectStream
read in interface ObjectStream<ChunkSample>java.io.IOExceptionpublic void setStart(int aStart)
public void setEnd(int aEnd)
public void reset()
throws java.io.IOException,
java.lang.UnsupportedOperationException
ObjectStream
reset in interface ObjectStream<ChunkSample>java.io.IOException
java.lang.UnsupportedOperationException
public void close()
throws java.io.IOException
ObjectStreamObjectStream and releases all allocated
resources. After close was called its not allowed to call
read or reset.
close in interface ObjectStream<ChunkSample>java.io.IOException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||