public class IndoEuropeanSentenceModel extends HeuristicSentenceModel implements Serializable
IndoEuropeanSentenceModel is a heuristic sentence
designed primarily for English. Whehter or not it balances
parentheses or forces the last token to be a boundary may be
set in the constructor. It uses the default implementation of
possible sentence starts and the following token sets:
Note that all of these sets are case insensitive.
Possible Stops ...!?"'').
Impossible Penultimates any single letter personal and professional titles, ranks, etc. commas, colon, and quotes common abbreviations directions corporate designators times, months, etc. U.S. political parties U.S. states (not ME or IN) shipping terms address abbreviations
Impossible Starts possible stops (see above) close parentheses ,;:------%
IndoEuropeanSentenceModel
with the same behavior as the model that was written.| Constructor and Description |
|---|
IndoEuropeanSentenceModel()
Construct an Indo-European sentence model that does
not force the final token to be a stop and does not
balance parentheses.
|
IndoEuropeanSentenceModel(boolean forceFinalToken,
boolean balanceParentheses)
Construct an Indo-European sentence model that forces final
tokens and balances parentheses according to the specified
flags.
|
balanceParens, boundaryIndices, forceFinalStop, possibleStartboundaryIndices, boundaryIndices, verifyBounds, verifyTokensWhitespacespublic IndoEuropeanSentenceModel()
public IndoEuropeanSentenceModel(boolean forceFinalToken,
boolean balanceParentheses)
forceFinalToken - Whether the final token is always a
sentence stop.balanceParentheses - Whether sentences can stop if not all
open parentheses have been closed.Copyright © 2016 Alias-i, Inc.. All rights reserved.