public class ConfusionMatrix extends Object
ConfusionMatrix represents a
quantitative comparison between two classifiers over a fixed set of
categories on a number of test cases. For convenience, one
classifier is termed the "reference" and the other the
"response".
Typically the reference will be determined by a human or other so-called "gold standard", whereas the response will be the result of an automatic classification. This is how confusion matrices are created from test cases. With this confusion matrix implementation, two human classifiers or two automatic classifications may also be compared. For instance, human classifiers that label corpora for training sets are often evaluated for inter-annotator agreement; the usual form of reporting for this is the kappa statistic, which is available in three varieties from the confusion matrix. A set of systems may also be compared pairwise, such as those arising from a competitive evaluation.
Confusion matrices may be initialized on construction; with no
matrix argument, they will be constructed with zero values in all
cells. The values can then be incremented by category name with
category name with increment(String,String) or by
category index with increment(int,int). There is also
a incrementByN(int,int,int) which allows explicit control
over matrix values.
Consider the following confusion matrix, which reports on the classification of 27 wines by grape variety. The reference in this case is the true variety and the response arises from the blind evaluation of a human judge.
Each row represents the results of classifying objects belonging to the category designated by that row. For instance, the first row is the result of 12 cabernet classifications. Reading across, 9 of those cabernets were correctly classified as cabernets, 3 were misclassified as syrahs, and none were misclassified as pinot noir. In the next row are the results for 9 syrahs, 3 of which were misclassified as cabernets and 1 of which was misclassified as a pinot. Similarly, the six pinots being classified are represented on the third row. In total, the classifier categorized 13 wines as cabernets, 9 wines as syrahs, and 5 wines as pinots. The sum of all numbers in the graph is equal to the number of trials, in this case 27. Further note that the correct answers are the ones on the diagonal of the matrix. The individual entries are recoverable using the method
Many-way Confusion Matrix Response Cabernet Syrah Pinot Refer-
enceCabernet 9 3 0 Syrah 3 5 1 Pinot 1 1 4
count(int,int). The positive and
negative counts per category may be recovered from the result of
oneVsAll(int).
Collective results are either averaged per category (macro average) or averaged per test case (micro average). The results reported here are for a single operating point of results. Very often in the research literature, results are returned for the best possible post-hoc system settings, established either globally or per category.
The multiple outcome classification can be decomposed into a
number of one-versus-all classification problems. For each
category, a classifier that categorizes objects as either belonging
to that category or not. From an n-way classifier, a
one-versus-all classifier can be constructed automatically by
treating an object to be classified as belonging to the category if
the category is the result of classifying it. For the above
three-way confusion matrix, the following three one-versus-all
matrices are returned as instances of PrecisionRecallEvaluation through the method oneVsAll(int):
Note that each has the same true-positive number as in the corresponding cell of the original confusion matrix. Further note that the sum of the cells in each derived matrix is the same as in the original matrix. Finally note that if the original classification problem was two dimensional, the derived matrix will be the same as the original matrix. The results of the various precision-recall evaluation methods for these matrices are shown in the class documentation for
Cab-vs-All Response Cab Other Refer
-enceCab 9 3 Other 4 11
Syrah-vs-All Response Syrah Other Refer
-enceSyrah 5 4 Other 4 14
Pinot-vs-All Response Pinot Other Refer
-encePinot 4 2 Other 1 20
PrecisionRecallEvaluation.
Macro-averaged results are just the average of the per-category results. These include precision, recall and f-measure. Yule's Q and Y statistics along with the per-category chi squared results are also computed based on the one-versus all matrices.
Micro-averaged results are reported based on another derived
matrix: the sum of the scores in the one-versus-all matrices. For
the above case, the result given as a PrecisionRecallEvaluation
by the method microAverage() is:
Note that the true positive cell will be the sum of the true-positive cells of the original matrix (9+5+4=18 in the running example). A little algebra shows that the false positive cell will be equal to the sum of the off-diagonal elements in the original confusion matrix (3+3+1+1+1=9); symmetry then shows that the false negative value will be the same. Finally, the true negative cell will bring the total up to the number of categories times the sum of the entries in the original matrix (here 27*3-18-9-9=45); it is also equal to two times the number of true positives plus the number of false negatives (here 2*18+9=45). Thus for one-versus-all confusion matrices derived from many-way confusion matrices, the micro-averaged precision, recall and f-measure will all be the same.
Sum of One-vs-All Matrices Response True False Refer
-enceTrue 18 9 False 9 45
For the above confusion matrix and derived matrices, the no-argument and category-indexed methods will return the values in the following tables. The hot-linked method documentation defines each statistic in detail.
Method Method() categories(){ "Cabernet", "Syrah", "Pinot" }totalCount()27 totalCorrect()18 totalAccuracy()0.6667 confidence95()0.1778 confidence99()0.2341 macroAvgPrecision()0.6826 macroAvgRecall()0.6574 macroAvgFMeasure()0.6676 randomAccuracy()0.3663 randomAccuracyUnbiased()0.3663 kappa()0.4740 kappaUnbiased()0.4735 kappaNoPrevalence()0.3333 referenceEntropy()1.5305 responseEntropy()1.4865 crossEntropy()1.5376 jointEntropy()2.6197 conditionalEntropy()1.0892 mutualInformation()0.3973 klDivergence()0.007129 chiSquaredDegreesOfFreedom()4 chiSquared()15.5256 phiSquared()0.5750 cramersV()0.5362 lambdaA()0.4000 lambdaB()0.3571
Method 0 (Cabernet) 1 (Syrah) 2 (Pinot) conditionalEntropy(int)0.8113 1.3516 1.2516
| Constructor and Description |
|---|
ConfusionMatrix(String[] categories)
Construct a confusion matrix with all zero values from the
specified array of categories.
|
ConfusionMatrix(String[] categories,
int[][] matrix)
Construct a confusion matrix with the specified set of
categories and values.
|
| Modifier and Type | Method and Description |
|---|---|
String[] |
categories()
Return a copy of the array of categories for this confusion
matrix.
|
double |
chiSquared()
Returns Pearson's C2 independence test
statistic for this matrix.
|
int |
chiSquaredDegreesOfFreedom()
Return the number of degrees of freedom of this confusion
matrix for the χ2 statistic.
|
double |
conditionalEntropy()
Returns the conditional entropy of the response distribution
against the reference distribution.
|
double |
conditionalEntropy(int refCategoryIndex)
Returns the entropy of the distribution of categories
in the response given that the reference category was
as specified.
|
double |
confidence(double z)
Returns the normal approximation of half of the binomial
confidence interval for this confusion matrix for the specified
z-score.
|
double |
confidence95()
Returns half the width of the 95% confidence interval for this
confusion matrix.
|
double |
confidence99()
Returns half the width of the 99% confidence interval for this
confusion matrix.
|
int |
count(int referenceCategoryIndex,
int responseCategoryIndex)
Returns the value of the cell in the matrix for the specified
reference and response category indices.
|
double |
cramersV()
Returns the value of Cramér's V statistic for this matrix.
|
double |
crossEntropy()
The cross-entropy of the response distribution against the
reference distribution.
|
int |
getIndex(String category)
Return the index of the specified category in the list of
categories, or
-1 if it is not a category for this
confusion matrix. |
void |
increment(int referenceCategoryIndex,
int responseCategoryIndex)
Add one to the cell in the matrix for the specified reference
and response category indices.
|
void |
increment(String referenceCategory,
String responseCategory)
Add one to the cell in the matrix for the specified reference
and response categories.
|
void |
incrementByN(int referenceCategoryIndex,
int responseCategoryIndex,
int num)
Add n to the cell in the matrix for the specified reference
and response category indices.
|
double |
jointEntropy()
Returns the entropy of the joint reference and response
distribution as defined by the underlying matrix.
|
double |
kappa()
Returns the value of the kappa statistic with chance agreement
determined by the reference distribution.
|
double |
kappaNoPrevalence()
Returns the value of the kappa statistic adjusted for
prevalence.
|
double |
kappaUnbiased()
Returns the value of the kappa statistic adjusted for bias.
|
double |
klDivergence()
Returns the Kullback-Liebler (KL) divergence between the
reference and response distributions.
|
double |
lambdaA()
Returns Goodman and Kruskal's λA index
of predictive association.
|
double |
lambdaB()
Returns Goodman and Kruskal's λB index
of predictive association.
|
double |
macroAvgFMeasure()
Returns the average F measure per category.
|
double |
macroAvgPrecision()
Returns the average precision per category.
|
double |
macroAvgRecall()
Returns the average precision per category.
|
int[][] |
matrix()
Return a copy of the matrix values.
|
PrecisionRecallEvaluation |
microAverage()
Returns the micro-averaged precision-recall evaluation.
|
double |
mutualInformation()
Returns the mutual information between the reference and
response distributions.
|
int |
numCategories()
Returns the number of categories for this confusion matrix.
|
PrecisionRecallEvaluation |
oneVsAll(int categoryIndex)
Returns the one-versus-all precision-recall evaluation for the
specified category index.
|
double |
phiSquared()
Returns the value of Pearson's φ2 index of mean
square contingency for this matrix.
|
double |
randomAccuracy()
The expected accuracy from a strategy of randomly guessing
categories according to reference and response distributions.
|
double |
randomAccuracyUnbiased()
The expected accuracy from a strategy of randomly guessing
categories according to the average of the reference and
response distributions.
|
double |
referenceEntropy()
The entropy of the decision problem itself as defined by the
counts for the reference.
|
double |
responseEntropy()
The entropy of the response distribution.
|
String |
toString()
Return a string-based representation of this confusion matrix.
|
double |
totalAccuracy()
Returns the percentage of response that are correct.
|
int |
totalCorrect()
Returns the total number of responses that matched the
reference.
|
int |
totalCount()
Returns the total number of classifications.
|
public ConfusionMatrix(String[] categories)
The categories are copied so that subsequent changes to the array passed in will not affect the confusion matrix.
categories - Array of categories for classification.public ConfusionMatrix(String[] categories, int[][] matrix)
For example, the many-way confusion matrix shown in the class documentation above would be initialized as:
String[] categories = new String[]
{ "Cabernet", "Syrah", "Pinot" };
int[][] wineTastingScores = new int[][]
{ { 9, 3, 0 },
{ 3, 5, 1 },
{ 1, 1, 4 } };
ConfusionMatrix matrix
= new ConfusionMatrix(categories,wineTastingScores);
categories - Array of categories for classification.matrix - Matrix of initial values.IllegalArgumentException - If the categories and matrix
do not agree in dimension or the matrix contains a negative
value.public String[] categories()
getIndex(). For a category c in the
set of categories:
categories()[getIndex(c)].equals(c)
and for an index i in range:
getIndex(categories()[i]) = i
getIndex(String)public int numCategories()
numCategories() is
guaranteed to be the same as categories().length
and thus may be used to compute iteration bounds.public int getIndex(String category)
-1 if it is not a category for this
confusion matrix. The index is the index in the array
returned by categories().category - Category whose index is returned.categories()public int[][] matrix()
public void increment(int referenceCategoryIndex,
int responseCategoryIndex)
referenceCategoryIndex - Index of reference category.responseCategoryIndex - Index of response category.IllegalArgumentException - If either index is out of range.public void incrementByN(int referenceCategoryIndex,
int responseCategoryIndex,
int num)
referenceCategoryIndex - Index of reference category.responseCategoryIndex - Index of response category.num - Number of instances to increment by.IllegalArgumentException - If either index is out of
range, or if the result of the increment results in a negative
value in a cell.public void increment(String referenceCategory, String responseCategory)
referenceCategory - Name of reference category.responseCategory - Name of response category.IllegalArgumentException - If either category is
not a category for this confusion matrix.public int count(int referenceCategoryIndex,
int responseCategoryIndex)
referenceCategoryIndex - Index of reference category.responseCategoryIndex - Index of response category.IllegalArgumentException - If either index is out of range.public int totalCount()
totalCount()
= &Sigmai
&Sigmaj
count(i,j)
public int totalCorrect()
totalCorrect()
= &Sigmai
count(i,i)
The value is the same as that of the
microAverage().correctResponse()>public double totalAccuracy()
totalAccuracy() = totalCorrect() / totalCount()
Note that the classification error is just one minus the
accuracy, because each answer is either true or false.public double confidence95()
Confidence is determined as described in confidence(double)
with parameter z=1.96.
public double confidence99()
Confidence is determined as described in confidence(double)
with parameter z=2.58.
public double confidence(double z)
A z score represents the number of standard deviations from the mean, with the following correspondence of z score and percentage confidence intervals:
Thus the z-score for a 95% confidence interval is 1.96 standard deviations. The confidence interval is just the accuracy plus or minus the z score times the standard deviation. To compute the normal approximation to the deviation of the binomial distribution, assume
Z Confidence +/- Z 1.65 90% 1.96 95% 2.58 99% 3.30 99.9%
p=totalAccuracy() and n=totalCount().
Then the confidence interval is defined in terms of the deviation of
binomial(p,n), which is defined by first taking
the variance of the Bernoulli (one trial) distribution with
success rate p:
and then dividing by the numbervariance(bernoulli(p)) = p * (1-p)
n of trials in the
binomial distribution to get the variance of the binomial
distribution:
and then taking the square root to get the deviation:variance(binomial(p,n)) = p * (1-p) / n
For instance, withdev(binomial(p,n)) = sqrt(p * (1-p) / n)
p=totalAccuracy()=.90, and
n=totalCount()=10000:
dev(binomial(.9,10000)) = sqrt(0.9 * (1.0 - 0.9) / 10000) = 0.003
Thus to determine the 95% confidence interval, we take
z = 1.96 for a half-interval width of
1.96 * 0.003 = 0.00588. The
resulting interval is just 0.90 +/- 0.00588
or roughly (.894,.906).z - The z score, or number of standard deviations.public double referenceEntropy()
referenceEntropy()
=
- Σi
referenceLikelihood(i)
* log2 referenceLikelihood(i)
referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
public double responseEntropy()
responseEntropy()
=
- Σi
responseLikelihood(i)
* log2 responseLikelihood(i)
responseLikelihood(i) = oneVsAll(i).responseLikelihood()
public double crossEntropy()
crossEntropy()
=
- Σi
referenceLikelihood(i)
* log2 responseLikelihood(i)
responseLikelihood(i) = oneVsAll(i).responseLikelihood()
referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
Note that crossEntropy() >= referenceEntropy().
The entropy of a distribution is simply the cross-entropy of
the distribution with itself.
Low cross-entropy does not entail good classification, though good classification entails low cross-entropy.
public double jointEntropy()
jointEntropy()
= - Σi
Σj
P'(i,j) * log2 P'(i,j)
P'(i,j) = count(i,j) / totalCount()
and where by convention:
0 log2 0 =def 0
public double conditionalEntropy(int refCategoryIndex)
conditionalEntropy(i)
= - Σj
P'(j|i) * log2 P'(j|i)
P'(j|i) = count(j,i) / referenceCount(i)
where
refCategoryIndex - Index of the reference category.public double conditionalEntropy()
conditionalEntropy()
= Σi
referenceLikelihood(i) * conditionalEntropy(i)
referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
Note that this statistic is not symmetric in that if the roles of reference and response are reversed, the answer may be different.
public double kappa()
kappa() = (totalAccuracy() - randomAccuracy())
/ (1 - randomAccuracy())
The kappa statistic was introduced in:
Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational And Psychological Measurement 20:37-46.
public double kappaUnbiased()
kappaUnbiased() = (totalAccuracy() - randomAccuracyUnbiased()) / (1 - randomAccuracyUnbiased())The unbiased version of Kappa was introduced in:
Siegel, Sidney and N. John Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.
public double kappaNoPrevalence()
kappaNoPrevalence() = 2 * totalAccuracy() - 1
The no prevalence version of kappa was introduced in:
Byrt, Ted, Janet Bishop and John B. Carlin. 1993. Bias, prevalence, and kappa. Journal of Clinical Epidemiology 46(5):423-429.These authors suggest reporting the three kappa statistics defined in this class: kappa, kappa adjusted for prevalence, and kappa adjusted for bias.
public double randomAccuracy()
randomAccuracy()
= Σi
referenceLikelihood(i) * resultLikelihood(i)
referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
responseLikelihood(i) = oneVsAll(i).responseLikelihood()
public double randomAccuracyUnbiased()
randomAccuracyUnbaised()
= Σi
((referenceLikelihood(i) + resultLikelihood(i))/2)2
referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
responseLikelihood(i) = oneVsAll(i).responseLikelihood()
public int chiSquaredDegreesOfFreedom()
n×m matrix, the number of degrees of
freedom is equal to (n-1)*(m-1). Because this
is a symmetric matrix of dimensions equal to the number of
categories, the result is defined to be:
chiSquaredDegreesOfFreedom()
= (numCategories() - 1)2
public double chiSquared()
chiSquaredDegreesOfFreedom().
See Statistics.chiSquaredIndependence(double[][])
for definitions of the statistic over matrices.
public double phiSquared()
phiSquared() = chiSquared() / totalCount()
As with our other statistics, this is the sample value; the true contingency by the true random variables defining the reference and response.
public double cramersV()
cramersV() = (phiSquared() / (numCategories()-1))(1/2)
public PrecisionRecallEvaluation oneVsAll(int categoryIndex)
categoryIndex - Index of category.public PrecisionRecallEvaluation microAverage()
oneVsAll(int) over all category indices. See the
class definition above for an example.public double macroAvgPrecision()
macroAvgPrecision()
= Σi
precision(i) / numCategories()
precision(i) = oneVsAll(i).precision()
public double macroAvgRecall()
macroAvgRecall()
= Σi
recall(i) / numCategories()
recall(i) = oneVsAll(i).recall()
public double macroAvgFMeasure()
macroAvgFMeasure()
= Σi
fMeasure(i) / numCategories()
recall(i) = oneVsAll(i).fMeasure()
Note that this is not necessarily the same value as results from computing the F-measure from the the macro-averaged precision and macro-averaged recall.
public double lambdaA()
lambdaA()
= (Σj
maxReferenceCount(j)) - maxReferenceCount()
/ (totalCount() - maxReferenceCount())
where maxReferenceCount(j) is the maximum count
in column j of the matrix:
maxReferenceCount(j) = MAXi count(i,j)
and where maxReferenceCount() is the maximum
reference count:
maxReferenceCount() = MAXi referenceCount(i)
Note that like conditional probability and conditional entropy, the λA statistic is antisymmetric; the measure λB simply reverses the rows and columns. The probabilistic interpretation of λA is like that of λB, only reversing the role of the reference and response.
public double lambdaB()
lambdaB()
= (Σj
maxResponseCount(i)) - maxResponseCount()
/ (totalCount() - maxResponseCount())
where maxResponseCount(i) is the maximum count
in row i of the matrix:
maxResponseCount(i) = MAXj count(i,j)
and where maxResponseCount() is the maximum
response count:
maxResponseCount() = MAXj responseCount(j)
The probabilistic interpration of λB is the reduction in error likelihood from knowing the specified reference category in predicting the response category. It will thus take on a value between 0.0 and 1.0, with higher values being better. Perfect association yields a value of 1.0 and perfect independence a value of 0.0.
Note that the λB statistic is antisymmetric; the measure λA simply reverses the rows and columns.
public double mutualInformation()
mutualInformation()
= Σi
Σj
P(i,j)
* log2
( P(i,j) / (Preference(i)
* Presponse(j)) )
P(i,j) = count(i,j) / totalCount()
Preference(i) = oneVsAll(i).referenceLikelihood()
Presponse(i) = oneVsAll(i).responseLikelihood()
A bit of algebra shows that mutual information is the reduction
in entropy of the response distribution from knowing the
reference distribution:
mutualInformation() = responseEntropy() - conditionalEntropy()
In this way it is similar to the
λB measure.
Mutual information is symmetric. We could also subtract the conditional entropy of the reference given the response from the reference entropy to get the same result.
public double klDivergence()
klDivergence()
= Σk
Preference(k)
* log2 (Preference(k)
/ Presponse(k))
Preference(i) = oneVsAll(i).referenceLikelihood()
Presponse(i) = oneVsAll(i).responseLikelihood()
Note that KL divergence is not symmetric in the reference and response
distributions.Copyright © 2019 Alias-i, Inc.. All rights reserved.