public class RegExChunker extends Object implements Chunker, Compilable, Serializable
RegExChunker finds chunks that matches regular
expressions. Specifically, a matcher is created and its Matcher.find() method is used to iterate over matching text
segments and convert them to chunks.
The behavior of the find method is largely determined by the
specific instance of Pattern) on which the chunker is
based. For more information, see Sun's RegEx
Tutorial.
All found chunks will receive a type and score that is specified at construction time.
Warning: Java uses the same regular expression matching
as Perl. Perl uses a greedy
strategy for quantifiers, taking something like .* to
match as many characters as possible. In constrast, disjunction
uses a first-match strategy. For example, the regular expression
ab|abc will not produce the same chunker as
abc|ab; for input abcde, the former will
return ab as a chunk, whereas the latter will return
abc. This first-best matching through disjunctions
takes precedence over any quantifiers applied to the strings.
For convenience, this class implements both the util.Compilable
and java.io.Serializable interfaces. These both store the
same thing, namely the string underlying the regex pattern, the chunk type
and the score. The reconstituted object will also be an instance of this
class.
| Constructor and Description |
|---|
RegExChunker(Pattern pattern,
String chunkType,
double chunkScore)
Construct a chunker based on the specified regular expression
pattern, producing the specified chunk type and score.
|
RegExChunker(String regex,
String chunkType,
double chunkScore)
Construct a chunker based on the specified regular expression,
producing the specified chunk type and score.
|
| Modifier and Type | Method and Description |
|---|---|
Chunking |
chunk(char[] cs,
int start,
int end)
Return the chunking of the specified character slice.
|
Chunking |
chunk(CharSequence cSeq)
Return the chunking of the specified character sequence.
|
void |
compileTo(ObjectOutput out)
Compiles this regular-expression chunker to the specified
object output.
|
public RegExChunker(String regex, String chunkType, double chunkScore)
Pattern.compile(String).regex - Regular expression for chunks.chunkType - Type for all found chunks.chunkScore - Score for all found chunks.public RegExChunker(Pattern pattern, String chunkType, double chunkScore)
pattern - Regular expression patternfor chunks.chunkType - Type for all found chunks.chunkScore - Score for all found chunks.public Chunking chunk(CharSequence cSeq)
Matcher.find() as applied
to the regular expression pattern underlying this chunker.public void compileTo(ObjectOutput out) throws IOException
compileTo in interface Compilableout - Object output to which this chunker is compiled.IOException - If there is an underlying I/O error during
the write.public Chunking chunk(char[] cs, int start, int end)
Copyright © 2016 Alias-i, Inc.. All rights reserved.