Class WordCounter
- java.lang.Object
-
- net.sf.okapi.steps.wordcount.common.BaseCounter
-
- net.sf.okapi.steps.wordcount.WordCounter
-
public class WordCounter extends BaseCounter
Word Count engine. Contains static methods to calculate number of words in a given text fragment.
-
-
Constructor Summary
Constructors Constructor Description WordCounter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static longcount(String string, LocaleId language)Counts words in a given string.static longcount(ITextUnit textUnit, LocaleId language)Counts words in the source part of a given text unit.static longcount(Segment segment, LocaleId language)Counts words in a given segment.static longcount(TextContainer textContainer, LocaleId language)Counts words in a given text container.static longcount(TextFragment textFragment, LocaleId language)Counts words in a given text fragment.static longcountFromLogographicCharacterCount(long characterCount, LocaleId language)For "logographic" languages, GMX-V 2.0 defines factors by which the character count should be divided in order to yield the word count.static longcountLogographicScript(Object text, LocaleId language)For "logographic" languages, GMX-V 2.0 defines factors by which the character count should be divided in order to yield the word count.protected longdoCountImpl(String text, LocaleId language)static longgetCount(ITextUnit tu)Returns the word count information stored by WordCountStep in the source part of a given text unit.static longgetCount(ITextUnit tu, int segIndex)Returns the word count information stored by WordCountStep in a given segment of the source part of a given text unit.static longgetCount(IWithAnnotations res)Returns the word count information stored by WordCountStep in annotations of a given resource.static longgetCount(Segment segment)Returns the word count information stored by WordCountStep in a given segment of the source part of a given text unit.static longgetCount(TextContainer tc)Returns the word count information stored by WordCountStep in the given text container.protected StringgetMetricNameForRetrieval()static voidsetCount(IWithAnnotations res, long count)-
Methods inherited from class net.sf.okapi.steps.wordcount.common.BaseCounter
doCount, doGetCount, doGetCount, doGetCount, getCount, getCount, getCount
-
-
-
-
Method Detail
-
doCountImpl
protected long doCountImpl(String text, LocaleId language)
- Specified by:
doCountImplin classBaseCounter
-
countLogographicScript
public static long countLogographicScript(Object text, LocaleId language)
For "logographic" languages, GMX-V 2.0 defines factors by which the character count should be divided in order to yield the word count. This method calculates that word count.The result will be
0if the language is logographic but does not have a character count factor defined (GMX.getCharacterCountFactor(LocaleId)returns-1d). In this case word counts are not meaningful for the supplied language.This method will throw
IllegalArgumentExceptionif the supplied language is not a logographic script (GMX.isLogographicScript(LocaleId)returnsfalse).
-
countFromLogographicCharacterCount
public static long countFromLogographicCharacterCount(long characterCount, LocaleId language)For "logographic" languages, GMX-V 2.0 defines factors by which the character count should be divided in order to yield the word count. This method calculates that word count.The result will be
0if the language is logographic but does not have a character count factor defined (GMX.getCharacterCountFactor(LocaleId)returns-1d). In this case word counts are not meaningful for the supplied language.This method will throw
IllegalArgumentExceptionif the supplied language is not a logographic script (GMX.isLogographicScript(LocaleId)returnsfalse).
-
count
public static long count(ITextUnit textUnit, LocaleId language)
Counts words in the source part of a given text unit.- Parameters:
textUnit- the given text unitlanguage- the language of the source- Returns:
- number of words
-
count
public static long count(TextContainer textContainer, LocaleId language)
Counts words in a given text container.- Parameters:
textContainer- the given text containerlanguage- the language of the text- Returns:
- number of words
-
count
public static long count(Segment segment, LocaleId language)
Counts words in a given segment.- Parameters:
segment- the given segmentlanguage- the language of the text- Returns:
- number of words
-
count
public static long count(TextFragment textFragment, LocaleId language)
Counts words in a given text fragment.- Parameters:
textFragment- the given text fragmentlanguage- the language of the text- Returns:
- number of words
-
count
public static long count(String string, LocaleId language)
Counts words in a given string.- Parameters:
string- the given stringlanguage- the language of the text- Returns:
- number of words
-
getMetricNameForRetrieval
protected String getMetricNameForRetrieval()
- Specified by:
getMetricNameForRetrievalin classBaseCounter
-
getCount
public static long getCount(IWithAnnotations res)
Returns the word count information stored by WordCountStep in annotations of a given resource.- Parameters:
res- the given resource- Returns:
- number of words (0 if no word count information found)
-
getCount
public static long getCount(ITextUnit tu)
Returns the word count information stored by WordCountStep in the source part of a given text unit.- Parameters:
tu- the given text unit- Returns:
- number of words (0 if no word count information found)
-
getCount
public static long getCount(TextContainer tc)
Returns the word count information stored by WordCountStep in the given text container.- Parameters:
tc- the given text container- Returns:
- number of words (0 if no word count information found)
-
getCount
public static long getCount(ITextUnit tu, int segIndex)
Returns the word count information stored by WordCountStep in a given segment of the source part of a given text unit.- Parameters:
tu- the given tusegIndex- index of the segment in the source- Returns:
- number of words (0 if no word count information found)
-
getCount
public static long getCount(Segment segment)
Returns the word count information stored by WordCountStep in a given segment of the source part of a given text unit.- Parameters:
segment- the given segment- Returns:
- number of words (0 if no word count information found)
-
setCount
public static void setCount(IWithAnnotations res, long count)
-
-