public class AutoCompleter extends Object implements Serializable
AutoCompleter maintains a dictionary of phrases with
counts and provides suggested completions based on prefix matching
by weighted edit distance and phrase likelihood.
The maximum number of results cannot be changed dynamically,
because it is used during construction to set up all of the
compiled trie structures used for decoding. To change the
maximum number of results, simply construct a new auto completer
using the same phrase counts as edit distance as this completer;
these may be retrieved using the methods
phraseCountMap() and editDistance().
For a given input, the complete(String) method returns
a sorted set of scored objects, containing the most likely phrase
completions and their score, up to the maximum number of results
specified during construction.
Scores for the phrases are defined by the log (base 2) of their probability estimates:
where probabilities are estimated using maximum likelihood:score(phrase) = log2 p'(phrase)
p'(phrase) = count(phrase) / Σphrase' count(phrase')
Additive smoothing may be easily carried out on the inputs, so it is not carried out by this class.
The score for a prefix matching a phrase is given by:
In words, the score for a prefix matching a phrase is the sum of log probability of the phrase plus the edit distance between the prefix and the best matching prefix of the phrase. The edit distances should thus be scaled as log probabilities in order to combine with the phrase probabilities properly. See the class documentation forscore(prefix,phrase) = MAXphrase.startsWith(prefix') editDistance.distance(prefix,prefix') + log2 p'(phrase)
TrainSpellChecker for general advice on
combining the tuning of edit distance with that of phrase
probabilities.
AutoCompleter is completely thread
safe. Setting the maximum search queue size is safe because
integer writes are atomic, but it is not declared volate, and hence
may not be visible to other threads without synchronization.
AutoCompleter may be serialized if and only if its
weighted edit distance is serializable. If so, the result
of serializing and reconstituting an auto-completer will produce
an auto-completer with the same behavior as the one serialized,
modulo the edit distance serialization, which is under control
of the edit distance implementation.| Constructor and Description |
|---|
AutoCompleter(Map<String,? extends Number> phraseCounts,
WeightedEditDistance editDistance,
int maxResultsPerPrefix,
int maxSearchQueueSize,
double minScore)
Construct an automatic completer from the specified phrases,
phrase counts, edit distance, and search parameters.
|
| Modifier and Type | Method and Description |
|---|---|
SortedSet<ScoredObject<String>> |
complete(String in)
Returns a set of scored phrases sorted into decreasing order of
score.
|
WeightedEditDistance |
editDistance()
Returns the weighted edit distance for this auto-completer.
|
int |
maxResultsPerPrefix()
Returns the maximum number of results returned by this
auto completer for each input prefix.
|
int |
maxSearchQueueSize()
Returns the maximum number of elements on the search queue.
|
Map<String,Float> |
phraseCountMap()
Returns the phrase counter for this auto completer.
|
void |
setMaxSearchQueueSize(int size)
Sets the maximum search queue size to the specified value.
|
public AutoCompleter(Map<String,? extends Number> phraseCounts, WeightedEditDistance editDistance, int maxResultsPerPrefix, int maxSearchQueueSize, double minScore)
phraseCounts - Map from phrases to counts.editDistance - Distance used to compare mismatched
suggestions.maxResultsPerPrefix - The maximum number of results that
can be returned.maxSearchQueueSize - The beam size for searching for
matches.minScore - Minimum score for outcome to be retained in results.IllegalArgumentException - If any of the counts is not finite or negative,
or if the max results or max queue sizes are not positive, or if the minimum
score is not finite and negative.public int maxResultsPerPrefix()
public WeightedEditDistance editDistance()
public Map<String,Float> phraseCountMap()
Float values calculated from compiled probability
estimates. Changes to the returned count map will not affect
this class.public int maxSearchQueueSize()
setMaxSearchQueueSize(int).public void setMaxSearchQueueSize(int size)
This operation is thread safe because integer sets are atomic. But changes may not be visible to other threads if not synchronized.
size - The new search queue size.IllegalArgumentException - If the size is zero or negative.public SortedSet<ScoredObject<String>> complete(String in)
To print out all the matches in descending order of scores, use:
for (ScoredObject<String> so : complete(String)) println("phrase=" + so.getObject() + " score=" + so.score());
in - The string to complete.Copyright © 2016 Alias-i, Inc.. All rights reserved.