public class FeatureTypeConcatRegex extends AbstractFeatureType
The feature generated here is whether a sequence of tokens has a particular sequence of given pattern or not. For example, if a pattern is to mathc a capital word, then for two token context window, various features generated are weither two token (bigram) sequence is having any of the following pattern or not: (1) Capital, Capital (2) Capital, Non-Capital (3) Non-capital, Capital. You can use any window around the current token (segment) for creating regular expression based features. Also, you can define your own patterns, by writing down the regular expression in a file, whose format is specified below.
A token in a token sequence has a index relative to the current token index, which is described below:
x0 x1 x2 x3 x4 x5 x6 x7 .... xn
-4 -3 -2 -1 0 0 0 1 2 ...
In above example, the current segment is from postion 4 to 6 with value of pos = 6 and prevPos = 3 in startScanFeaturesAt() call of FeatureGenerator. You can refer to any of the token relative to current position by using the index below the token sequence. Thus, you can create a pattern concat features for any token sequence in the neighbourhood of the current token, using relSegmentStart and relSegmentEnd. For, example to create pattern for two tokens to the left of the current token, following is the parameters to be passed to the constructor of the class:
| Modifier and Type | Field and Description |
|---|---|
protected int |
curId |
protected DataSequence |
data |
protected int |
idbase |
protected int |
index |
protected int |
left |
protected int |
maxSegmentLength |
protected int |
relSegmentEnd |
protected int |
relSegmentStart |
protected int |
right |
protected int |
window |
idPrefix| Constructor and Description |
|---|
FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd) |
FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
int maxSegmentLength) |
FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
int maxSegmentLength,
String patternFile)
Constructs an object of ConcatRegexFeatures to be used to generate features for the token
sequence as specified.
|
FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
int maxSegmentLength,
String[][] patternString) |
FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
String patternFile) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
hasNext()
Tests if next feature exists
|
Feature |
next()
Gets the next feature
|
boolean |
startScanFeaturesAt(DataSequence data,
int startPos,
int endPos)
Starts scanning features of the given segment of a sequence
|
getTypeID, needTraining, readTrainingResult, saveTrainingResult, setTypeID, startScanFeaturesAt, supportSegment, trainprotected transient DataSequence data
protected int index
protected int idbase
protected int curId
protected int window
protected int relSegmentStart
protected int relSegmentEnd
protected int maxSegmentLength
protected int left
protected int right
public FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
int maxSegmentLength,
String patternFile)
fgen - a Model objectrelSegmentStart - index of the reltive position for left boundaryrelSegmentEnd - index of the reltive position for right boundarymaxSegmentLength - maximum size of a segmentpatternFile - file which contains the pattern definitionpublic FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
int maxSegmentLength,
String[][] patternString)
public FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
int maxSegmentLength)
public FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd)
public FeatureTypeConcatRegex(int relSegmentStart,
int relSegmentEnd,
String patternFile)
public boolean startScanFeaturesAt(DataSequence data, int startPos, int endPos)
FeatureTypedata - the sequencestartPos - the start position of the segmentendPos - the end position of the segmentpublic boolean hasNext()
FeatureTypepublic Feature next()
FeatureTypeCopyright © 2018 JULIE Lab, Germany. All rights reserved.