net.sf.mmm.util.text.api
Enum DiacriticalMark

java.lang.Object
  extended by java.lang.Enum<DiacriticalMark>
      extended by net.sf.mmm.util.text.api.DiacriticalMark
All Implemented Interfaces:
Serializable, Comparable<DiacriticalMark>, Datatype<Character>

public enum DiacriticalMark
extends Enum<DiacriticalMark>
implements Datatype<Character>

This enum contains the most important diacritical marks.
If you are NOT familiar with unicode and languages that use non-ASCII characters, you should know that each DiacriticalMark represents a specific shape like e.g. '~', '^', etc. that is added at a specific position (on top, at bottom, etc.) to a letter. For instance if you add two dots to the letter 'a' you get 'ä'.
To make things really complicated, unicode added combining characters representing the mark itself in addition to the precomposed characters (combination of a specific character with the mark[s]).

Since:
2.0.0
Author:
Joerg Hohwiller (hohwille at users.sourceforge.net)

Enum Constant Summary
ACUTE
          A mark that can be placed on top of some Latin, Cyrillic or Greek characters.
BREVE
          A mark that can be placed on top of some Latin, ... characters.
CARON
          A mark that can be placed on top of some Latin, ... characters.
CEDILLA
          A mark that can be placed at the bottom of some Latin characters.
CIRCUMFLEX
          A mark that can be placed on top of some Latin characters (e.g. in French).
DIAERESIS
          Two dots on top (trema, diaeresis, or umlaut).
DOT_ABOVE
          A mark attached at the top right of the letters o and u in the Vietnamese alphabet (overdot).
DOT_BELOW
          TODO
DOUBLE_ACUTE
          Like ACUTE but doubled.
DOUBLE_GRAVE
          Like GRAVE but doubled.
GRAVE
          A mark that can be placed on top of some Latin, Cyrillic or Greek characters.
HOOK_ABOVE
          A little question mark without the dot, that is placed on top of Vietnamese letters.
HORN_ABOVE
          A ... that is placed on top of Vietnamese vowels.
MACRON
          A ...
OGONEK
          A ...
RING_ABOVE
          A ...
TILDE
          ~ on top.
 
Field Summary
private  char combiningCharacter
           
private  Collection<Character> composedCharacters
           
private  Map<Character,Character> composeMap
           
private  Map<Character,Character> decomposeMap
           
private  char separateCharacter
           
private  String title
           
 
Method Summary
protected  void addComposition(char uncomposed, char composed)
          This method adds the given composition pair.
 Character compose(char character)
          This method composes the given character with this DiacriticalMark.
 Character decompose(char character)
          This method de-composes the given character with this DiacriticalMark.
 char getCombiningCharacter()
          This method gets the combining character for this DiacriticalMark.
 Collection<Character> getComposedCharacters()
          This method gets a Collection with all precomposed characters containing this mark.
 char getSeparateCharacter()
           
 String getTitle()
          This method gets the title of this datatype.
 Character getValue()
          This method returns the raw value of this datatype.
protected abstract  void initialize()
          This method is called at construction.
 String normalizeToAscii(char character)
          This method gets the ASCII-representation of the given character composed with this DiacriticalMark.
protected  void normalizeToAsciiRecursive(char decomposed, StringBuilder buffer, int compositionCount)
          This is the internal recursive implemenation of normalizeToAscii(char).
 String toString()
          This method needs to return the same result a Datatype.getTitle().
static DiacriticalMark valueOf(String name)
          Returns the enum constant of this type with the specified name.
static DiacriticalMark[] values()
          Returns an array containing the constants of this enum type, in the order they are declared.
 
Methods inherited from class java.lang.Enum
clone, compareTo, equals, finalize, getDeclaringClass, hashCode, name, ordinal, valueOf
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Enum Constant Detail

ACUTE

public static final DiacriticalMark ACUTE
A mark that can be placed on top of some Latin, Cyrillic or Greek characters. It looks like a stroke directing to the upper right corner. If your environment supports unicode, you can see it here: ´


BREVE

public static final DiacriticalMark BREVE
A mark that can be placed on top of some Latin, ... characters. It looks like an arc as the lower third of a circle. the If your environment supports unicode, you can see it here: ˘


CARON

public static final DiacriticalMark CARON
A mark that can be placed on top of some Latin, ... characters. It looks like a little 'v'. If your environment supports unicode, you can see it here: ˇ


CEDILLA

public static final DiacriticalMark CEDILLA
A mark that can be placed at the bottom of some Latin characters. If your environment supports unicode, you can see it here: ¸


CIRCUMFLEX

public static final DiacriticalMark CIRCUMFLEX
A mark that can be placed on top of some Latin characters (e.g. in French). It looks like a small '^'. If your environment supports unicode, you can see it here: ̂


DIAERESIS

public static final DiacriticalMark DIAERESIS
Two dots on top (trema, diaeresis, or umlaut). E.g. in 'ä'.


DOT_ABOVE

public static final DiacriticalMark DOT_ABOVE
A mark attached at the top right of the letters o and u in the Vietnamese alphabet (overdot).


DOT_BELOW

public static final DiacriticalMark DOT_BELOW
TODO


DOUBLE_ACUTE

public static final DiacriticalMark DOUBLE_ACUTE
Like ACUTE but doubled. If your environment supports unicode, you can see it here: ˝


DOUBLE_GRAVE

public static final DiacriticalMark DOUBLE_GRAVE
Like GRAVE but doubled. If your environment supports unicode, you can see it here: TODO


GRAVE

public static final DiacriticalMark GRAVE
A mark that can be placed on top of some Latin, Cyrillic or Greek characters. It looks like a stroke directing to the lower right. If your environment supports unicode, you can see it here: `


HOOK_ABOVE

public static final DiacriticalMark HOOK_ABOVE
A little question mark without the dot, that is placed on top of Vietnamese letters. ϒ ɓ


HORN_ABOVE

public static final DiacriticalMark HORN_ABOVE
A ... that is placed on top of Vietnamese vowels. TODO


MACRON

public static final DiacriticalMark MACRON
A ... TODO


OGONEK

public static final DiacriticalMark OGONEK
A ... TODO


RING_ABOVE

public static final DiacriticalMark RING_ABOVE
A ... TODO


TILDE

public static final DiacriticalMark TILDE
~ on top.

Field Detail

separateCharacter

private final char separateCharacter
See Also:
getSeparateCharacter()

combiningCharacter

private final char combiningCharacter
See Also:
getCombiningCharacter()

title

private final String title
See Also:
getTitle()

composeMap

private final Map<Character,Character> composeMap
See Also:
compose(char)

decomposeMap

private final Map<Character,Character> decomposeMap
See Also:
decompose(char)

composedCharacters

private final Collection<Character> composedCharacters
See Also:
getComposedCharacters()
Method Detail

values

public static DiacriticalMark[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
for (DiacriticalMark c : DiacriticalMark.values())
    System.out.println(c);

Returns:
an array containing the constants of this enum type, in the order they are declared

valueOf

public static DiacriticalMark valueOf(String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)

Parameters:
name - the name of the enum constant to be returned.
Returns:
the enum constant with the specified name
Throws:
IllegalArgumentException - if this enum type has no constant with the specified name
NullPointerException - if the argument is null

initialize

protected abstract void initialize()
This method is called at construction.


addComposition

protected void addComposition(char uncomposed,
                              char composed)
This method adds the given composition pair.

Parameters:
uncomposed - is the uncomposed character.
composed - is the composed character.

getSeparateCharacter

public char getSeparateCharacter()
Returns:
the separateCharacter

getCombiningCharacter

public char getCombiningCharacter()
This method gets the combining character for this DiacriticalMark. It represents the mark itself but is TODO. Therefore unicode allows to express 'ä' as two TODO.

Returns:
the combining character.

getTitle

public String getTitle()
This method gets the title of this datatype. The title is a string representation intended to be displayed to end-users (i18n will be done externally - see NlsMessage).
Since the general contract of Datatype.toString() is quite weak, this method is added to explicitly express the presence of the title and to ensure implementors of this interface can NOT miss to implement this.

Specified by:
getTitle in interface Datatype<Character>
Returns:
the display title of this datatype.
See Also:
Datatype.toString()

getValue

public Character getValue()
This method returns the raw value of this datatype. This will typically be a common java.lang datatype. In case of a composed datatype it is also legal that this method returns the datatype instance itself.

Specified by:
getValue in interface Datatype<Character>
Returns:
the value of this datatype.

compose

public Character compose(char character)
This method composes the given character with this DiacriticalMark.

Parameters:
character - is the character to compose (e.g. 'a').
Returns:
the composed character (e.g. 'ä' or 'á') or null if no such composition exists in unicode.

decompose

public Character decompose(char character)
This method de-composes the given character with this DiacriticalMark. In other words this DiacriticalMark is removed from the given character if it is composed. It is the inverse operation of compose(char).

Parameters:
character - is the character to de-compose (e.g. 'ä' or 'á').
Returns:
the de-composed character (e.g. 'a') or null if the given character does is not composed with this DiacriticalMark.

normalizeToAscii

public String normalizeToAscii(char character)
This method gets the ASCII-representation of the given character composed with this DiacriticalMark. This is similar to decompose(char) but e.g. for the character 'e' is appended.

Parameters:
character - is the character to normalize to ASCII (e.g. 'Ä' or 'á').
Returns:
the de-composed character (e.g. 'Ae' or 'a') or null if the given character does is not composed with this DiacriticalMark.
See Also:
UnicodeUtil.normalize2Ascii(char)

normalizeToAsciiRecursive

protected void normalizeToAsciiRecursive(char decomposed,
                                         StringBuilder buffer,
                                         int compositionCount)
This is the internal recursive implemenation of normalizeToAscii(char).

Parameters:
decomposed - is the decomposed character to normalize to ASCII.
buffer - is the StringBuilder where to
compositionCount - is the recursion counter used to detect infinity loops in case of a missconfiguration.

getComposedCharacters

public Collection<Character> getComposedCharacters()
This method gets a Collection with all precomposed characters containing this mark.

Returns:
the composed characters.

toString

public String toString()
This method needs to return the same result a Datatype.getTitle().

Specified by:
toString in interface Datatype<Character>
Overrides:
toString in class Enum<DiacriticalMark>
Returns:
the display title of this datatype.


Copyright © 2001-2010 mmm-Team. All Rights Reserved.