public class TopicSignatureModel extends Object
A program for topic signature model estimation
This program finds a set of weighted terms to represent the semantics of a topic signature. A topic signature
can be anything here including multiword phrases, concept pairs, and individual terms. There are two approaches
to thee model estimation. One is the maximum likelihood estimator. The other uses EM algoritm if one can provide
the disbribution of individual terms on a corpus. See more details about the EM algorithm in our previous work.
Zhou, X., Zhang, X., and Hu, X., Semantic Smoothing of Document Models for Agglomerative Clustering,
In the Twentieth International Joint Conference on Artificial Intelligence(IJCAI 07), Hyderabad, India, Jan 6-12,
2007, pp. 2928-2933
One should provide indices for topic signatures and individual terms, respectively, or
give the occurrence matrix of topic signatures and individual terms.
Copyright: Copyright (c) 2005
Company: IST, Drexel University
| Constructor and Description |
|---|
TopicSignatureModel(IRSignatureIndexList srcIndexList,
IntSparseMatrix cooccurMatrix)
The constructor for the mode of the maximum likelihood estimator
|
TopicSignatureModel(IRSignatureIndexList srcIndexList,
IntSparseMatrix srcSignatureDocMatrix,
IntSparseMatrix destDocSignatureMatrix)
The constructor for the mode of the maximum likelihood estimator
|
TopicSignatureModel(IRSignatureIndexList srcIndexList,
IntSparseMatrix srcSignatureDocMatrix,
IRSignatureIndexList destIndexList,
IntSparseMatrix destDocSignatureMatrix)
The constructor for the mode of EM algorithm
|
TopicSignatureModel(IRSignatureIndexList srcIndexList,
IRSignatureIndexList destIndexList,
IntSparseMatrix cooccurMatrix)
The constructor for the mode of EM algorithm
|
| Modifier and Type | Method and Description |
|---|---|
ArrayList |
genSignatureTranslation(int srcSignatureIndex) |
boolean |
genTransMatrix(int minDocFrequency,
String matrixPath,
String matrixKey) |
double |
getEMBackgroundCoefficient() |
int |
getEMIterationNum() |
double |
getProbThreshold() |
boolean |
getUseDocFrequency() |
boolean |
getUseEM() |
boolean |
getUseMeanTrim() |
void |
setEMBackgroundCoefficient(double coeffi) |
void |
setEMIterationNum(int iterationNum) |
void |
setProbThreshold(double threshold) |
void |
setUseDocFrequency(boolean option) |
void |
setUseEM(boolean option) |
void |
setUseMeanTrim(boolean option) |
public TopicSignatureModel(IRSignatureIndexList srcIndexList, IntSparseMatrix srcSignatureDocMatrix, IntSparseMatrix destDocSignatureMatrix)
srcIndexList - the statisitcs of topic signatures in the collection.srcSignatureDocMatrix - the doc-term matrix for topic signatures.destDocSignatureMatrix - the doc-term matrix for individual terms topic signatures will translate to.public TopicSignatureModel(IRSignatureIndexList srcIndexList, IntSparseMatrix cooccurMatrix)
srcIndexList - the statisitcs of topic signatures in the collection.cooccurMatrix - the cooccurence matrix of topic signatures and individual terms.public TopicSignatureModel(IRSignatureIndexList srcIndexList, IRSignatureIndexList destIndexList, IntSparseMatrix cooccurMatrix)
srcIndexList - the statisitcs of topic signatures in the collection.destIndexList - the statisitcs of individual terms in the collection.cooccurMatrix - the cooccurence matrix of topic signatures and individual terms.public TopicSignatureModel(IRSignatureIndexList srcIndexList, IntSparseMatrix srcSignatureDocMatrix, IRSignatureIndexList destIndexList, IntSparseMatrix destDocSignatureMatrix)
srcIndexList - the statisitcs of topic signatures in the collection.srcSignatureDocMatrix - srcSignatureDocMatrix the doc-term matrix for topic signatures.destIndexList - the statisitcs of individual terms in the collection.destDocSignatureMatrix - destDocSignatureMatrix the doc-term matrix for individual terms topic signatures will translate to.public void setUseEM(boolean option)
public boolean getUseEM()
public void setEMBackgroundCoefficient(double coeffi)
public double getEMBackgroundCoefficient()
public void setEMIterationNum(int iterationNum)
public int getEMIterationNum()
public void setUseDocFrequency(boolean option)
public boolean getUseDocFrequency()
public void setUseMeanTrim(boolean option)
public boolean getUseMeanTrim()
public void setProbThreshold(double threshold)
public double getProbThreshold()
public boolean genTransMatrix(int minDocFrequency,
String matrixPath,
String matrixKey)
public ArrayList genSignatureTranslation(int srcSignatureIndex)
Copyright © 2018 JULIE Lab, Germany. All rights reserved.