public class MUC7Reader
extends org.apache.uima.collection.CollectionReader_ImplBase
| Modifier and Type | Field and Description |
|---|---|
static String[] |
ELEMENT_TEXT_TO_BE_PROCESSED
XML elements comprised in an object list
|
static String |
PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input
files.
|
| Constructor and Description |
|---|
MUC7Reader() |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
int[] |
getBeginEndOfSequence(String sequenceString,
String inputString,
int startOfSequence)
Given a sequence, a string in which the token occurs and a stating point, this methods
retrieves begin and end position of this sequence.
|
int[] |
getBeginEndOfToken(String tokenString,
String inputString,
int startOfToken)
Given a token, a string in which the token occurs and a stating point, this methods retrieves
begin and end position of this token.
|
void |
getNext(org.apache.uima.cas.CAS cas) |
org.apache.uima.util.Progress[] |
getProgress() |
boolean |
hasNext() |
void |
initialize() |
String |
normalizeString(String stringToBeNormalized)
normalizes a string by replacing newlines by whitspaces, by removing sequences of more that
one whitespace and by removing the newlines at the beginning of a line; also removes stuff
like "A;N;D;R;LR;" etc.
|
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInitgetConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValuegetCasManager, getLogger, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaDataclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitpublic static final String[] ELEMENT_TEXT_TO_BE_PROCESSED
public static final String PARAM_INPUTDIR
public int[] getBeginEndOfToken(String tokenString, String inputString, int startOfToken)
tokenString - (the token to be searched)inputString - (the string in which we search the token)startOfToken - (the begin where the token should be searched in the inputString)public int[] getBeginEndOfSequence(String sequenceString, String inputString, int startOfSequence)
sequenceString - (the sequence to be searched)inputString - (the string in which we search de sequence)startOfSequence - (the begin were the sequence shpuld be searched in the inputString)public String normalizeString(String stringToBeNormalized)
stringToBeNormalized - public void initialize()
throws org.apache.uima.resource.ResourceInitializationException
initialize in class org.apache.uima.collection.CollectionReader_ImplBaseorg.apache.uima.resource.ResourceInitializationExceptionpublic void getNext(org.apache.uima.cas.CAS cas)
throws IOException,
org.apache.uima.collection.CollectionException
IOExceptionorg.apache.uima.collection.CollectionExceptionpublic void close()
throws IOException
IOExceptionpublic org.apache.uima.util.Progress[] getProgress()
public boolean hasNext()
throws IOException,
org.apache.uima.collection.CollectionException
IOExceptionorg.apache.uima.collection.CollectionExceptionCopyright © 2018 JULIE Lab Jena, Germany. All rights reserved.