public class BM25 extends java.lang.Object implements RelevanceRanker
At the extreme values of the coefficient b, BM25 turns into ranking functions known as BM11 (for b = 1) and BM15 (for b = 0). BM25F is a modification of BM25 in which the document is considered to be composed from several fields (such as headlines, main text, anchor text) with possibly different degrees of importance.
BM25 and its newer variants represent state-of-the-art TF-IDF-like retrieval functions used in document retrieval, such as web search.
TFIDF| Constructor and Description |
|---|
BM25()
Default constructor with k1 = 1.2, b = 0.75, delta = 1.0.
|
BM25(double k1,
double b,
double delta)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
double |
rank(Corpus corpus,
TextTerms doc,
java.lang.String[] terms,
int[] tf,
int n)
Returns a relevance score between a set of terms and a document based on a corpus.
|
double |
rank(Corpus corpus,
TextTerms doc,
java.lang.String term,
int tf,
int n)
Returns a relevance score between a term and a document based on a corpus.
|
double |
score(double freq,
int docSize,
double avgDocSize,
long N,
long n)
Returns a relevance score between a term and a document based on a corpus.
|
double |
score(double freq,
long N,
long n)
Returns a relevance score between a term and a document based on a corpus.
|
double |
score(int termFreq,
int docLen,
double avgDocLen,
int titleTermFreq,
int titleLen,
double avgTitleLen,
int anchorTermFreq,
int anchorLen,
double avgAnchorLen,
long N,
long n)
Returns a relevance score between a term and a document based on a corpus.
|
public BM25()
public BM25(double k1,
double b,
double delta)
k1 - is a positive tuning parameter that calibrates
the document term frequency scaling. A k1 value of 0 corresponds to a
binary model (no term frequency), and a large value corresponds to using
raw term frequency.b - b is another tuning parameter (0 ≤ b ≤ 1) which determines
the scaling by document length: b = 1 corresponds to fully scaling the
term weight by the document length, while b = 0 corresponds to no length
normalization.public double score(int termFreq,
int docLen,
double avgDocLen,
int titleTermFreq,
int titleLen,
double avgTitleLen,
int anchorTermFreq,
int anchorLen,
double avgAnchorLen,
long N,
long n)
termFreq - normalized term frequency of searching term in the document to rank.N - the number of documents in the corpus.n - the number of documents containing the given term in the corpus;public double score(double freq,
long N,
long n)
freq - normalized term frequency of searching term in the document to rank.N - the number of documents in the corpus.n - the number of documents containing the given term in the corpus;public double score(double freq,
int docSize,
double avgDocSize,
long N,
long n)
freq - the frequency of searching term in the document to rank.docSize - the size of document to rank.avgDocSize - the average size of documents in the corpus.N - the number of documents in the corpus.n - the number of documents containing the given term in the corpus;public double rank(Corpus corpus, TextTerms doc, java.lang.String term, int tf, int n)
RelevanceRankerrank in interface RelevanceRankercorpus - the corpus.doc - the document to rank.term - the searching term.tf - the term frequency in the document.n - the number of documents containing the given term in the corpus;public double rank(Corpus corpus, TextTerms doc, java.lang.String[] terms, int[] tf, int n)
RelevanceRankerrank in interface RelevanceRankercorpus - the corpus.doc - the document to rank.terms - the searching terms.tf - the term frequencies in the document.n - the number of documents containing the given term in the corpus;