Package de.julielab.jcore.reader.muc7
Class MUC7Reader
- java.lang.Object
-
- org.apache.uima.resource.Resource_ImplBase
-
- org.apache.uima.resource.ConfigurableResource_ImplBase
-
- org.apache.uima.collection.CollectionReader_ImplBase
-
- de.julielab.jcore.reader.muc7.MUC7Reader
-
- All Implemented Interfaces:
org.apache.uima.collection.base_cpm.BaseCollectionReader,org.apache.uima.collection.CollectionReader,org.apache.uima.resource.ConfigurableResource,org.apache.uima.resource.Resource
public class MUC7Reader extends org.apache.uima.collection.CollectionReader_ImplBase
-
-
Field Summary
Fields Modifier and Type Field Description static String[]ELEMENT_TEXT_TO_BE_PROCESSEDXML elements comprised in an object liststatic StringPARAM_INPUTDIRName of configuration parameter that must be set to the path of a directory containing input files.
-
Constructor Summary
Constructors Constructor Description MUC7Reader()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()int[]getBeginEndOfSequence(String sequenceString, String inputString, int startOfSequence)Given a sequence, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this sequence.int[]getBeginEndOfToken(String tokenString, String inputString, int startOfToken)Given a token, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this token.voidgetNext(org.apache.uima.cas.CAS cas)org.apache.uima.util.Progress[]getProgress()booleanhasNext()voidinitialize()StringnormalizeString(String stringToBeNormalized)normalizes a string by replacing newlines by whitspaces, by removing sequences of more that one whitespace and by removing the newlines at the beginning of a line; also removes stuff like "A;N;D;R;LR;" etc.-
Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
-
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
-
Methods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
-
-
Field Detail
-
ELEMENT_TEXT_TO_BE_PROCESSED
public static final String[] ELEMENT_TEXT_TO_BE_PROCESSED
XML elements comprised in an object list
-
PARAM_INPUTDIR
public static final String PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input files.- See Also:
- Constant Field Values
-
-
Method Detail
-
getBeginEndOfToken
public int[] getBeginEndOfToken(String tokenString, String inputString, int startOfToken)
Given a token, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this token.- Parameters:
tokenString- (the token to be searched)inputString- (the string in which we search the token)startOfToken- (the begin where the token should be searched in the inputString)- Returns:
- the begin and the end position of the token in an int array {begin, end}
-
getBeginEndOfSequence
public int[] getBeginEndOfSequence(String sequenceString, String inputString, int startOfSequence)
Given a sequence, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this sequence.- Parameters:
sequenceString- (the sequence to be searched)inputString- (the string in which we search de sequence)startOfSequence- (the begin were the sequence shpuld be searched in the inputString)- Returns:
- the begin and the end position of the sequence in an int array {begin, end}
-
normalizeString
public String normalizeString(String stringToBeNormalized)
normalizes a string by replacing newlines by whitspaces, by removing sequences of more that one whitespace and by removing the newlines at the beginning of a line; also removes stuff like "A;N;D;R;LR;" etc.- Parameters:
stringToBeNormalized-- Returns:
- the normalized string
-
initialize
public void initialize() throws org.apache.uima.resource.ResourceInitializationException- Overrides:
initializein classorg.apache.uima.collection.CollectionReader_ImplBase- Throws:
org.apache.uima.resource.ResourceInitializationException
-
getNext
public void getNext(org.apache.uima.cas.CAS cas) throws IOException, org.apache.uima.collection.CollectionException- Throws:
IOExceptionorg.apache.uima.collection.CollectionException
-
close
public void close() throws IOException- Throws:
IOException
-
getProgress
public org.apache.uima.util.Progress[] getProgress()
-
hasNext
public boolean hasNext() throws IOException, org.apache.uima.collection.CollectionException- Throws:
IOExceptionorg.apache.uima.collection.CollectionException
-
-