Class GMX
- java.lang.Object
-
- net.sf.okapi.steps.wordcount.common.GMX
-
public class GMX extends Object
Implementation of the GMX-V specification, v. 2.0- Version:
- 0.2 08.26.2015
- See Also:
http://www.xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html,http://www.etsi.org/deliver/etsi_gs/LIS/001_099/004/02.00.00_60/gs_LIS004v020000p.pdf
-
-
Field Summary
Fields Modifier and Type Field Description static StringAlphaNumericAutoTextCharacterCountAn accumulation of the character count for identifiable alphanumeric words, e.g.static StringAlphaNumericAutoTextWordCountAn accumulation of the word count for identifiable alphanumeric words, e.g.static StringAlphanumericOnlyTextUnitCharacterCountAn accumulation of the character count for text units that have been identified as containing only alphanumeric words.static StringAlphanumericOnlyTextUnitWordCountAn accumulation of the word count for text units that have been identified as containing only alphanumeric words.static StringComplexNumericAutoTextCharacterCountAn accumulation of the character count for complex numeric values which include decimal and/or thousands separators, e.g.static StringComplexNumericAutoTextWordCountAn accumulation of the word count for complex numeric values which include decimal and/or thousands separators, e.g.static StringDateAutoTextCharacterCountAn accumulation of the character count for identifiable dates, e.g.static StringDateAutoTextWordCountAn accumulation of the word count for identifiable dates, e.g.static StringExactMatchedCharacterCountAn accumulation of the character count for text units that have been matched unambiguously with a prior translation and require no translator input.static StringExactMatchedWordCountAn accumulation of the word count for text units that have been matched unambiguously with a prior translation and thus require no translator input.static StringextNamePrefixstatic StringFileCountThe total number of files.static StringFuzzyMatchedCharacterCountAn accumulation of the character count for text units that have a fuzzy match against a leveraged translation memory database.static StringFuzzyMatchedWordCountAn accumulation of the word count for text units that have been fuzzy matched against a leveraged translation memory database.static StringLeveragedMatchedCharacterCountAn accumulation of the character count for text units that have been matched against a leveraged translation memory database.static StringLeveragedMatchedWordCountAn accumulation of the word count for text units that have been matched against a leveraged translation memory database.static StringMeasurementAutoTextCharacterCountAn accumulation of the character count for identifiable measurement values, e.g.static StringMeasurementAutoTextWordCountAn accumulation of the word count for identifiable measurement values, e.g.static StringMeasurementOnlyTextUnitCharacterCountAn accumulation of the character count from measurement-only text units.static StringMeasurementOnlyTextUnitWordCountAn accumulation of the word count from measurement-only text units.static StringNumericOnlyTextUnitCharacterCountAn accumulation of the character count for text units that have been identified as containing only numeric words.static StringNumericOnlyTextUnitWordCountAn accumulation of the word count for text units that have been identified as containing only numeric words.static StringOverallCharacterCountThe total of all of the three main character counts (TotalCharacterCount + PunctuationCharacterCount + WhiteSpaceCharacterCount) in the canonical form of the text units in the document.static StringPageCountThe total number of pages.static StringProjectFuzzyMatchedCharacterCountThe character count for fuzzy matched text within all files within a given project.static StringProjectFuzzyMatchedWordCountThe word count for fuzzy matched text units within all files within a given project.static StringProjectRepetionMatchedCharacterCountThe character count for text that is identical within all files within a given project.static StringProjectRepetionMatchedWordCountThe word count for text units that are identical within all files within a given project.static StringProtectedCharacterCountAn accumulation of the character count for text that has been marked as 'protected', or otherwise not translatable (XLIFF text enclosed inelements). static StringProtectedWordCountAn accumulation of the word count for text that has been marked as 'protected', or otherwise not translatable (XLIFF text enclosed inelements). static StringPunctuationCharacterCountThe total of all punctuation characters in the canonical form of text in the document that DO NOT form part of the character count as per section 2.10.static StringRepetitionMatchedCharacterCountAn accumulation of the character count for repeating text units that have not been matched in any other form.static StringRepetitionMatchedWordCountAn accumulation of the word count for repeating text units that have not been matched in any other form.static StringScreenCountA count of the total number of screens.static StringSimpleNumericAutoTextCharacterCountAn accumulation of the character count for simple numeric values, e.g.static StringSimpleNumericAutoTextWordCountAn accumulation of the word count for simple numeric values, e.g.static StringTextUnitCountThe total number of text units.static StringTMAutoTextCharacterCountAn accumulation of the character count for identifiable trade marks, e.g.static StringTMAutoTextWordCountAn accumulation of the word count for identifiable trade marks, e.g.static StringTotalCharacterCountAn accumulation of the character counts, both translatable and non-translatable, from the individual text units that make up the document.static StringTotalWordCountTotal word count - an accumulation of the word counts, both translatable and non-translatable, from the individual text units that make up the document.static StringTranslatableInlineCountThe actual non-linking inline element count for unqualified (see Section 2.14.2 Unqualified Text Units) text units.static StringTranslatableLinkingInlineCountThe actual linking inline element count for unqualified (see Section 2.14.2 Unqualified Text Units) text units.static StringWhiteSpaceCharacterCountThe total of all white space characters in the canonical form of the text units in the document.
-
Constructor Summary
Constructors Constructor Description GMX()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static doublegetCharacterCountFactor(LocaleId language)For "logographic" languages, GMX-V 2.0 defines factors by which the character count should be divided in order to yield the word count.static booleanisLogographicScript(LocaleId locId)Indicates whether or not the language is considered a "logographic" language per the GMX-V 2.0 spec.
-
-
-
Field Detail
-
extNamePrefix
public static final String extNamePrefix
- See Also:
- Constant Field Values
-
TotalWordCount
public static final String TotalWordCount
Total word count - an accumulation of the word counts, both translatable and non-translatable, from the individual text units that make up the document.- See Also:
- Constant Field Values
-
ProtectedWordCount
public static final String ProtectedWordCount
An accumulation of the word count for text that has been marked as 'protected', or otherwise not translatable (XLIFF text enclosed inelements). - See Also:
- Constant Field Values
-
ExactMatchedWordCount
public static final String ExactMatchedWordCount
An accumulation of the word count for text units that have been matched unambiguously with a prior translation and thus require no translator input.- See Also:
- Constant Field Values
-
LeveragedMatchedWordCount
public static final String LeveragedMatchedWordCount
An accumulation of the word count for text units that have been matched against a leveraged translation memory database.- See Also:
- Constant Field Values
-
RepetitionMatchedWordCount
public static final String RepetitionMatchedWordCount
An accumulation of the word count for repeating text units that have not been matched in any other form. Repetition matching is deemed to take precedence over fuzzy matching.- See Also:
- Constant Field Values
-
FuzzyMatchedWordCount
public static final String FuzzyMatchedWordCount
An accumulation of the word count for text units that have been fuzzy matched against a leveraged translation memory database.- See Also:
- Constant Field Values
-
AlphanumericOnlyTextUnitWordCount
public static final String AlphanumericOnlyTextUnitWordCount
An accumulation of the word count for text units that have been identified as containing only alphanumeric words.- See Also:
- Constant Field Values
-
NumericOnlyTextUnitWordCount
public static final String NumericOnlyTextUnitWordCount
An accumulation of the word count for text units that have been identified as containing only numeric words.- See Also:
- Constant Field Values
-
MeasurementOnlyTextUnitWordCount
public static final String MeasurementOnlyTextUnitWordCount
An accumulation of the word count from measurement-only text units.- See Also:
- Constant Field Values
-
SimpleNumericAutoTextWordCount
public static final String SimpleNumericAutoTextWordCount
An accumulation of the word count for simple numeric values, e.g. 10.- See Also:
- Constant Field Values
-
ComplexNumericAutoTextWordCount
public static final String ComplexNumericAutoTextWordCount
An accumulation of the word count for complex numeric values which include decimal and/or thousands separators, e.g. 10,000.00.- See Also:
- Constant Field Values
-
MeasurementAutoTextWordCount
public static final String MeasurementAutoTextWordCount
An accumulation of the word count for identifiable measurement values, e.g. 10.50 mm. Measurement values take precedent over the above numeric categories. No double counting of these categories is allowed.- See Also:
- Constant Field Values
-
AlphaNumericAutoTextWordCount
public static final String AlphaNumericAutoTextWordCount
An accumulation of the word count for identifiable alphanumeric words, e.g. AEG321.- See Also:
- Constant Field Values
-
DateAutoTextWordCount
public static final String DateAutoTextWordCount
An accumulation of the word count for identifiable dates, e.g. 25 June 1992.- See Also:
- Constant Field Values
-
TMAutoTextWordCount
public static final String TMAutoTextWordCount
An accumulation of the word count for identifiable trade marks, e.g. "Weapons of Mass Destruction...".- See Also:
- Constant Field Values
-
TotalCharacterCount
public static final String TotalCharacterCount
An accumulation of the character counts, both translatable and non-translatable, from the individual text units that make up the document. This count includes all non white space characters in the document (please refer to Section 2.7. White Space Characters for details of what constitutes white space characters), excluding inline markup and punctuation characters (please refer to Section 2.10. Punctuation Characters for details of what constitutes punctuation characters).- See Also:
- Constant Field Values
-
PunctuationCharacterCount
public static final String PunctuationCharacterCount
The total of all punctuation characters in the canonical form of text in the document that DO NOT form part of the character count as per section 2.10. Punctuation Characters.- See Also:
- Constant Field Values
-
WhiteSpaceCharacterCount
public static final String WhiteSpaceCharacterCount
The total of all white space characters in the canonical form of the text units in the document. Please refer to section 2.7. White Space Characters for a detailed explanation of how white space characters are identified and counted.- See Also:
- Constant Field Values
-
OverallCharacterCount
public static final String OverallCharacterCount
The total of all of the three main character counts (TotalCharacterCount + PunctuationCharacterCount + WhiteSpaceCharacterCount) in the canonical form of the text units in the document. (Added in GMX-V 2.0)- See Also:
- Constant Field Values
-
ProtectedCharacterCount
public static final String ProtectedCharacterCount
An accumulation of the character count for text that has been marked as 'protected', or otherwise not translatable (XLIFF text enclosed inelements). - See Also:
- Constant Field Values
-
ExactMatchedCharacterCount
public static final String ExactMatchedCharacterCount
An accumulation of the character count for text units that have been matched unambiguously with a prior translation and require no translator input.- See Also:
- Constant Field Values
-
LeveragedMatchedCharacterCount
public static final String LeveragedMatchedCharacterCount
An accumulation of the character count for text units that have been matched against a leveraged translation memory database.- See Also:
- Constant Field Values
-
RepetitionMatchedCharacterCount
public static final String RepetitionMatchedCharacterCount
An accumulation of the character count for repeating text units that have not been matched in any other form. Repetition matching is deemed to take precedence over fuzzy matching.- See Also:
- Constant Field Values
-
FuzzyMatchedCharacterCount
public static final String FuzzyMatchedCharacterCount
An accumulation of the character count for text units that have a fuzzy match against a leveraged translation memory database.- See Also:
- Constant Field Values
-
AlphanumericOnlyTextUnitCharacterCount
public static final String AlphanumericOnlyTextUnitCharacterCount
An accumulation of the character count for text units that have been identified as containing only alphanumeric words.- See Also:
- Constant Field Values
-
NumericOnlyTextUnitCharacterCount
public static final String NumericOnlyTextUnitCharacterCount
An accumulation of the character count for text units that have been identified as containing only numeric words.- See Also:
- Constant Field Values
-
MeasurementOnlyTextUnitCharacterCount
public static final String MeasurementOnlyTextUnitCharacterCount
An accumulation of the character count from measurement-only text units.- See Also:
- Constant Field Values
-
SimpleNumericAutoTextCharacterCount
public static final String SimpleNumericAutoTextCharacterCount
An accumulation of the character count for simple numeric values, e.g. 10.- See Also:
- Constant Field Values
-
ComplexNumericAutoTextCharacterCount
public static final String ComplexNumericAutoTextCharacterCount
An accumulation of the character count for complex numeric values which include decimal and/or thousands separators, e.g. 10,000.00.- See Also:
- Constant Field Values
-
MeasurementAutoTextCharacterCount
public static final String MeasurementAutoTextCharacterCount
An accumulation of the character count for identifiable measurement values, e.g. 10.50 mm. Measurement values take precedent over the above numeric categories. No double counting of these categories is allowed.- See Also:
- Constant Field Values
-
AlphaNumericAutoTextCharacterCount
public static final String AlphaNumericAutoTextCharacterCount
An accumulation of the character count for identifiable alphanumeric words, e.g. AEG321.- See Also:
- Constant Field Values
-
DateAutoTextCharacterCount
public static final String DateAutoTextCharacterCount
An accumulation of the character count for identifiable dates, e.g. 25 June 1992.- See Also:
- Constant Field Values
-
TMAutoTextCharacterCount
public static final String TMAutoTextCharacterCount
An accumulation of the character count for identifiable trade marks, e.g. "Weapons of Mass Destruction...".- See Also:
- Constant Field Values
-
TranslatableInlineCount
public static final String TranslatableInlineCount
The actual non-linking inline element count for unqualified (see Section 2.14.2 Unqualified Text Units) text units. Please refer to Section 2.11. Inline Element Counts for a detailed explanation and examples for this category.- See Also:
- Constant Field Values
-
TranslatableLinkingInlineCount
public static final String TranslatableLinkingInlineCount
The actual linking inline element count for unqualified (see Section 2.14.2 Unqualified Text Units) text units. Please refer to Section 2.12. Linking Inline Elements for a detailed explanation and examples for this category.- See Also:
- Constant Field Values
-
TextUnitCount
public static final String TextUnitCount
The total number of text units.- See Also:
- Constant Field Values
-
FileCount
public static final String FileCount
The total number of files.- See Also:
- Constant Field Values
-
PageCount
public static final String PageCount
The total number of pages.- See Also:
- Constant Field Values
-
ScreenCount
public static final String ScreenCount
A count of the total number of screens.- See Also:
- Constant Field Values
-
ProjectRepetionMatchedWordCount
public static final String ProjectRepetionMatchedWordCount
The word count for text units that are identical within all files within a given project. The word count for the primary occurrence is not included in this count, only that of subsequent matches.- See Also:
- Constant Field Values
-
ProjectFuzzyMatchedWordCount
public static final String ProjectFuzzyMatchedWordCount
The word count for fuzzy matched text units within all files within a given project. The word count for the primary occurrence is not included in this count, only that of subsequent matches.- See Also:
- Constant Field Values
-
ProjectRepetionMatchedCharacterCount
public static final String ProjectRepetionMatchedCharacterCount
The character count for text that is identical within all files within a given project. The character count for the primary occurrence is not included in this count, only that of subsequent matches.- See Also:
- Constant Field Values
-
ProjectFuzzyMatchedCharacterCount
public static final String ProjectFuzzyMatchedCharacterCount
The character count for fuzzy matched text within all files within a given project. The character count for the primary occurrence is not included in this count, only that of subsequent matches.- See Also:
- Constant Field Values
-
-
Method Detail
-
isLogographicScript
public static boolean isLogographicScript(LocaleId locId)
Indicates whether or not the language is considered a "logographic" language per the GMX-V 2.0 spec. Iftrue, word counts for this language are defined as (character count /getCharacterCountFactor(LocaleId)), unless the character count factor is-1din which case word counts are not meaningful for the language.- See Also:
http://www.xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html#LogographicScripts
-
getCharacterCountFactor
public static double getCharacterCountFactor(LocaleId language)
For "logographic" languages, GMX-V 2.0 defines factors by which the character count should be divided in order to yield the word count.Returns
-1dif the language does not have a factor. If this method returns-1dandisLogographicScript(LocaleId)returnstrue, then word counts are not meaningful for this language.- See Also:
http://www.xtm-intl.com/manuals/gmx-v/GMX-V-2.0.html#LogographicScripts
-
-