|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnet.sf.mmm.util.text.base.HyphenationPattern
public class HyphenationPattern
A HyphenationPattern is a pattern that acts as rule for a hyphenation
algorithm.
The concept is based on the thesis Word Hy-phen-a-tion by Com-put-er
by Franklin Mark Liang. Out of an entire dictionary of hyphenated
words for a specific language, a set of patterns
is extracted. To allow correct results with a reasonable small set of
patterns, these patterns form a chain of positive rules and exceptions.
Therefore a pattern can rank a
potential hyphenation-position with a number from 1 to
9. If two patterns apply for a hyphenation-position the higher
number wins. Odd numbers indicate a hyphenation while even values indicate an
exception that should NOT be hyphenated. The character '.' is used at the
beginning and/or end of a pattern to indicate that it should only match at
the beginning/end of the word to hyphenate.
Logically for each start-index of the (normalized) word to hyphenate
(enclosed with '.') all patterns are checked if
they match (please note that the order of the patterns is important!).
Matching means that the pattern stripped from digits is a substring of the
word at this start-index. If the pattern matches the
hyphenation-positions are applied.
Here is an example to illustrate the algorithm:
The string "Computer" will be transformed to
".computer." that matches the following patterns:
co4m5pu2t3er so the hyphenated input String is
finally "Com-put-er". The challenge is to implement this
algorithm in an efficient way.
| Field Summary | |
|---|---|
private HyphenationPatternPosition[] |
hyphenationPositions
|
static char |
TERMINATOR
The word-terminator representing start end end of a word. |
private String |
wordPart
The pattern without digits. |
private int |
wordPartHash
|
| Constructor Summary | |
|---|---|
HyphenationPattern(String pattern,
StringHasher hasher)
The constructor. |
|
| Method Summary | |
|---|---|
protected HyphenationPatternPosition[] |
getHyphenationPositions()
This method gets the hyphenation-positions of the pattern. |
String |
getPattern()
This method gets the original pattern ( word-part
with hyphenation-points). |
String |
getWordPart()
This method gets the word-part, that is the pattern without digits. |
int |
getWordPartHash()
This method gets the pre-calculated hash of word-part. |
String |
toString()
This method gets the original pattern. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
private final String wordPart
pattern without digits.
private final int wordPartHash
getWordPartHash()private final HyphenationPatternPosition[] hyphenationPositions
getHyphenationPositions()public static final char TERMINATOR
| Constructor Detail |
|---|
public HyphenationPattern(String pattern,
StringHasher hasher)
pattern - is the raw pattern.hasher - is the hash-algorithm to use for the
word-part-hash.| Method Detail |
|---|
protected HyphenationPatternPosition[] getHyphenationPositions()
hyphenation-positions of the pattern.
HyphenationPatternPositions.public String getWordPart()
pattern without digits. If the word-part is a
substring of the word to hyphenate (enclosed with '.'), the
hyphenation-points are applied to the
HyphenationState.
HyphenationState.apply(HyphenationPattern)public int getWordPartHash()
word-part.hash-code of
word-part. A specific hash algorithm is used that
allows efficient calculation of shifting substrings.
public String getPattern()
word-part
with hyphenation-points).
public String toString()
toString in class Object
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||