public class SimpleSentenceSplitter extends java.lang.Object implements SentenceSplitter
Recognizing the end of a sentence is not an easy task for a computer. In English, punctuation marks that usually appear at the end of a sentence may not indicate the end of a sentence. The period is the worst offender. A period can end a sentence but it can also be part of an abbreviation or acronym, an ellipsis, a decimal number, or part of a bracket of periods surrounding a Roman numeral. A period can even act both as the end of an abbreviation and the end of a sentence at the same time. Other the other hand, some poems may not contain any sentence punctuation at all.
Another problem punctuation mark is the single quote, which can introduce a quote or start a contraction such as 'tis. Leading-quote contractions are uncommon in contemporary English texts, but appear frequently in Early Modern English texts. This tokenizer assumes that the text has already been segmented into paragraphs. Any carriage returns will be replaced by whitespace.
| Modifier and Type | Method and Description |
|---|---|
static SimpleSentenceSplitter |
getInstance()
Returns the singleton instance.
|
java.lang.String[] |
split(java.lang.String text)
Splits the text into sentences.
|
public static SimpleSentenceSplitter getInstance()
public java.lang.String[] split(java.lang.String text)
SentenceSplittersplit in interface SentenceSplitter