public class MultinomialDistribution extends Object
MultinomialDistribution results from drawing a fixed
number of samples from a multivariate distribution. Thus the
probability distribution log2Probability(int[]) is over an
array of counts for the dimensions of the underlying multivariate
distribution. This class also contains a static method log2MultinomialCoefficient(int[])to compute multinomial coefficients.
The method chiSquared(int[]) returns the chi-squared
statistic for a sample of outcome counts represented by an array of
integers. The number of degrees of freedom is one less than the
number of dimensions.
As of LingPipe 3.2.0, the dependency on Jakarta Commons Math was removed. As a result, we removed the two methods that computed p-values. Here's their implementation in case you need the functionality (you may need to increas the text size):
import org.apache.commons.math.MathException;
import org.apache.commons.math.distribution.ChiSquaredDistribution;
import org.apache.commons.math.distribution.ChiSquaredDistributionImpl;
/**
* Returns the p-value for the chi-squared statistic on the specified
* sample counts.
...
double pValue(int[] sampleCounts) throws MathException {
ChiSquaredDistribution chiSq
= new ChiSquaredDistributionImpl(numDimensions()-1);
double c = chiSquared(sampleCounts);
return chiSq.cumulativeProbability(c);
}
For more information, see:
| Constructor and Description |
|---|
MultinomialDistribution(MultivariateDistribution distribution)
Construct a multinomial distribution based on the specified
multivariate distribution.
|
| Modifier and Type | Method and Description |
|---|---|
MultivariateDistribution |
basisDistribution()
Returns the multivariate distribution that forms the basis of
this multinomial distribution.
|
double |
chiSquared(int[] sampleCounts)
Returns the chi-squared statistic for rejecting the null
hypothesis that the specified samples were generated by this
distribution.
|
static double |
log2MultinomialCoefficient(int[] sampleCounts)
Returns the log (base 2) multinomial coefficient for the
specified counts.
|
double |
log2Probability(int[] sampleCounts)
Returns the log (base 2) probability of the distribution of
outcomes specified in the argument.
|
int |
numDimensions()
Returns the number of dimensions in this multinomial.
|
public MultinomialDistribution(MultivariateDistribution distribution)
distribution - Underlying multivariate distribution
defining the constructed multinomial.public double log2Probability(int[] sampleCounts)
The definition of the probability value for multinomials is:
P(sampleCounts)
= multinomialCoefficient(sampleCounts)
* Πi
P(i)sampleCounts[i]
where the multinomial coefficient is as defined in the method documentation
for log2MultinomialCoefficient(int[]). Taking logarithms yields:
log2 P(sampleCounts)
=
log2 multinomialCoefficient(sampleCounts)
+
Σi
sampleCounts[i] * log2 P(i)
Note that if the multivariate probability is zero for an
outcome with a non-zero count, the result will be Double.NEGATIVE_INFINITY.sampleCounts - Array of counts for outcomes.IllegalArgumentException - If the number of outcome
counts is not the same as the number of dimensions of this multinomial.public double chiSquared(int[] sampleCounts)
The definition for the chi-square value is the sum of square differences between sample counts and expected counts, normalized by expected count:
χ2(sampleCounts)
= Σi
(sampleCounts[i] - expectedCount(i))2
/ expectedCount(i)
where the expected counts are computed based on the underlying
multivariate distribution and the total sample count:
expectedCount(i)
= probability(i) * totalCount
where totalCount is the sum of all of the sample
counts.
Note that the chi-squared test is a large sample test. For
accurate results, each expected count should be at least five; in
symbols, expectedCount(i) >= 5 for all i.
sampleCounts - Array of sample counts.IllegalArgumentException - If the number of outcome
counts is not the same as the number of dimensions of this
multinomial.public int numDimensions()
public MultivariateDistribution basisDistribution()
public static double log2MultinomialCoefficient(int[] sampleCounts)
multinomialCoefficient(sampleCounts)
= totalCount! / ( Πi sampleCounts[i]! )
Taking logarithms produces:
log2 multinomialCoefficient(sampleCounts)
= log2 totalCount!
- Σi log2 sampleCounts[i]!
The multinomial coefficient is often written using a notation
similar to that used for the factorial as
(sampleCounts[0],...,sampleCounts[n-1])!.sampleCounts - Array of outcome counts.Copyright © 2019 Alias-i, Inc.. All rights reserved.