public class PrecisionRecallEvaluation extends Object
PrecisionRecallEvaluation collects and reports a
suite of descriptive statistics for binary classification tasks.
The basis of a precision recall evaluation is a matrix of counts
of reference and response classifications. Each cell in the matrix
corresponds to a method returning a long integer count.
The most basic statistic is accuracy, which is the number of correct responses divided by the total number of cases.
Response Reference Totals true false Refer
-encetrue truePositive()(TP)falseNegative()(FN)positiveReference()(TP+FN)false falsePositive()(FP)trueNegative()(TN)negativeReference()(FP+TN)Response Totals positiveResponse()(TP+FP)negativeResponse()(FN+TN)total()(TP+FN+FP+TN)
accuracy()
= correct() / total()
This class derives its name from the following four statistics,
which are illustrated in the four tables.
recall()
= truePositive() / positiveReference()
precision()
= truePositive() / positiveResponse()
rejectionRecall()
= trueNegative() / negativeReference()
rejectionPrecision()
= trueNegative() / negativeResponse()
Each measure is defined to be the green count divided by the green
plus red count in the corresponding table:
This picture clearly illustrates the relevant dualities. Precision is the dual to recall if the reference and response are switched (the matrix is transposed). Similarly, rejection recall is dual to recall with true and false labels switched (reflection around each axis in turn); rejection precision is similarly dual to precision.
Recall Response True False Refer
-enceTrue + - False
Precision Response True False Refer
-enceTrue + False -
Rejection
RecallResponse True False Refer
-enceTrue False - +
Rejection
PrecisionResponse True False Refer
-enceTrue - False +
Precision and recall may be combined by weighted geometric
averaging by using the f-measure statistic, with
β between 0 and infinity being the relative
weight of precision, with 1 being a neutral value.
fMeasure() = fMeasure(1)
fMeasure(β)
= (1 + β2)
* precision()
* recall()
/ (recall() + β2 * precision())
There are four traditional measures of binary classification, which are as follows.
fowlkesMallows()
= truePositive() / (precision() * recall())(1/2)
jaccardCoefficient()
= truePositive() / (total() - trueNegative())
yulesQ()
= (truePositive() * trueNegative() - falsePositive() * falseNegative())
/ (truePositive() * trueNegative() + falsePositive() * falsePositive())
yulesY()
= ((truePositive() * trueNegative())(1/2)
- (falsePositive() * falseNegative())(1/2))
/ ((truePositive() * trueNegative())(1/2) + (falsePositive() * falsePositive())(1/2))
Replacing precision and recall with their definitions,
TP/(TP+FP) and TP/(TP+FN):
F1
= 2 * (TP/(TP+FP)) * (TP/(TP+FN))
/ (TP/(TP+FP) + TP/(TP+FN))
= 2 * (TP*TP / (TP+FP)(TP+FN))
/ (TP*(TP+FN)/(TP+FP)(TP+FN) + TP*(TP+FP)/(TP+FN)(TP+FP))
= 2 * (TP / (TP+FP)(TP+FN))
/ ((TP+FN)/(TP+FP)(TP+FN) + (TP+FP)/(TP+FN)(TP+FP))
= 2 * TP /
/ ((TP+FN) + (TP+FP))
= 2*TP / (2*TP + FP + FN)
Thus the F1-measure is very closely related to the Jaccard
coefficient, TP/(TP+FP+FN). Like the Jaccard
coefficient, the F measure does not vary with varying true
negative counts. Rejection precision and recall do vary with
changes in true negative count.
Basic reference and response likelihoods are computed by frequency.
referenceLikelihood() = positiveReference() / total()
responseLikelihood() = positiveResponse() / total()
An algorithm that chose responses at random according to the
response likelihood would have the following accuracy against
test cases chosen at random according to the reference likelihood:
randomAccuracy()
= referenceLikelihood() * responseLikelihood()
+ (1 - referenceLikelihood()) * (1 - responseLikelihood())
The two summands arise from the likelihood of true positive and the
likelihood of a true negative. From random accuracy, the
κ-statistic is defined by dividing out the random accuracy
from the accuracy, in some way giving a measure of performance
above a baseline expectation.
kappa()
= kappa(accuracy(),randomAccuracy())
kappa(p,e)
= (p - e) / (1 - e)
There are two alternative forms of the κ-statistic, both of which attempt to correct for putative bias in the estimation of random accuracy. The first involves computing the random accuracy by taking the average of the reference and response likelihoods to be the baseline reference and response likelihood, and squaring the result to get the so-called unbiased random accuracy and the unbiased κ-statistic:
randomAccuracyUnbiased()
= avgLikelihood()2
+ (1 - avgLikelihood())2
avgLikelihood() = (referenceLikelihood() + responseLikelihood()) / 2
kappaUnbiased()
= kappa(accuracy(),randomAccuracyUnbiased())
Kappa can also be adjusted for the prevalence of positive reference cases, which leads to the following simple definition:
kappaNoPrevalence()
= (2 * accuracy()) - 1
Pearson's C2 statistic is provided by the following method:
chiSquared()
= total() * phiSquared()
phiSquared()
= ((truePositive()*trueNegative()) * (falsePositive()*falseNegative()))2
/ ((truePositive()+falseNegative()) * (falsePositive()+trueNegative()) * (truePositive()+falsePositive()) * (falseNegative()+trueNegative()))
The accuracy deviation is the deviation of the average number of positive cases in a binomial distribution with accuracy equal to the classification accuracy and number of trials equal to the total number of cases.
accuracyDeviation()
= (accuracy() * (1 - accuracy()) / total())(1/2)
This number can be used to provide error intervals around the
accuracy results.
Using the following three tables as examples:
The various statistics evaluate to the following values:
Cab-vs-All Response Cab Other Refer
-enceCab 9 3 Other 4 11
Syrah-vs-All Response Syrah Other Refer
-enceSyrah 5 4 Other 4 14
Pinot-vs-All Response Pinot Other Refer
-encePinot 4 2 Other 1 20
Method Cabernet Syrah Pinot positiveReference()12 9 6 negativeReference()15 18 21 positiveResponse()13 9 5 negativeResponse()14 18 22 correctResponse()20 19 24 total()27 27 27 accuracy()0.7407 0.7037 0.8889 recall()0.7500 0.5555 0.6666 precision()0.6923 0.5555 0.8000 rejectionRecall()0.7333 0.7778 0.9524 rejectionPrecision()0.7858 0.7778 0.9091 fMeasure()0.7200 0.5555 0.7272 fowlkesMallows()12.49 9.00 5.48 jaccardCoefficient()0.5625 0.3846 0.5714 yulesQ()0.7838 0.6279 0.9512 yulesY()0.4835 0.3531 0.7269 referenceLikelihood()0.4444 0.3333 0.2222 responseLikelihood()0.4815 0.3333 0.1852 randomAccuracy()0.5021 0.5556 0.6749 kappa()0.4792 0.3333 0.6583 randomAccuracyUnbiased()0.5027 0.5556 0.6756 kappaUnbiased()0.4789 0.3333 0.6575 kappaNoPrevalence()0.4814 0.4074 0.7778 chiSquared()6.2382 3.0000 11.8519 phiSquared()0.2310 0.1111 0.4390 accuracyDeviation()0.0843 0.0879 0.0605
| Constructor and Description |
|---|
PrecisionRecallEvaluation()
Construct a precision-recall evaluation with all counts set to
zero.
|
PrecisionRecallEvaluation(long tp,
long fn,
long fp,
long tn)
Construction a precision-recall evaluation initialized with the
specified counts.
|
| Modifier and Type | Method and Description |
|---|---|
double |
accuracy()
Returns the sample accuracy of the responses.
|
double |
accuracyDeviation()
Returns the standard deviation of the accuracy.
|
void |
addCase(boolean reference,
boolean response)
Adds a case with the specified reference and response
classifications.
|
double |
chiSquared()
Returns the χ2 value.
|
long |
correctResponse()
Returns the number of cases where the response is correct.
|
long |
falseNegative()
Returns the number of false negative cases.
|
long |
falsePositive()
Returns the number of false positive cases.
|
double |
fMeasure()
Returns the F1 measure.
|
double |
fMeasure(double beta)
Returns the
Fβ value for
the specified β. |
static double |
fMeasure(double beta,
double recall,
double precision)
Returns the Fβ measure for
a specified β, recall and precision values.
|
double |
fowlkesMallows()
Return the Fowlkes-Mallows score.
|
long |
incorrectResponse()
Returns the number of cases where the response is incorrect.
|
double |
jaccardCoefficient()
Returns the Jaccard coefficient.
|
double |
kappa()
Returns the value of the kappa statistic.
|
double |
kappaNoPrevalence()
Returns the value of the kappa statistic adjusted for
prevalence.
|
double |
kappaUnbiased()
Returns the value of the unbiased kappa statistic.
|
long |
negativeReference()
Returns the number of negative reference cases.
|
long |
negativeResponse()
Returns the number of negative response cases.
|
double |
phiSquared()
Returns the φ2 value.
|
long |
positiveReference()
Returns the number of positive reference cases.
|
long |
positiveResponse()
Returns the number of positive response cases.
|
double |
precision()
Returns the precision.
|
double |
randomAccuracy()
The probability that the reference and response are the same if
they are generated randomly according to the reference and
response likelihoods.
|
double |
randomAccuracyUnbiased()
The probability that the reference and the response are the same
if the reference and response likelihoods are both the average
of the sample reference and response likelihoods.
|
double |
recall()
Returns the recall.
|
double |
referenceLikelihood()
Returns the sample reference likelihood, or prevalence, which
is the number of positive references divided * by the total
number of cases.
|
double |
rejectionPrecision()
Returns the rejection prection, or selectivity, value.
|
double |
rejectionRecall()
Returns the rejection recall, or specificity, value.
|
double |
responseLikelihood()
Returns the sample response likelihood, which is the number of
positive responses divided by the total number of cases.
|
String |
toString()
Returns a string-based representation of this evaluation.
|
long |
total()
Returns the total number of cases.
|
long |
trueNegative()
Returns the number of true negative cases.
|
long |
truePositive()
Returns the number of true positive cases.
|
double |
yulesQ()
Return the value of Yule's Q statistic.
|
double |
yulesY()
Return the value of Yule's Y statistic.
|
public PrecisionRecallEvaluation()
public PrecisionRecallEvaluation(long tp,
long fn,
long fp,
long tn)
tp - True positive count.fn - False negative count.fp - False positive count.tn - True negative count.IllegalArgumentException - If any of the counts are
negative.public void addCase(boolean reference,
boolean response)
reference - Reference classification.response - Response classification.public long truePositive()
public long falsePositive()
public long trueNegative()
public long falseNegative()
public long positiveReference()
public long negativeReference()
public double referenceLikelihood()
public long positiveResponse()
public long negativeResponse()
public double responseLikelihood()
public long correctResponse()
public long incorrectResponse()
public long total()
public double accuracy()
public double recall()
public double precision()
public double rejectionRecall()
public double rejectionPrecision()
public double fMeasure()
fMeasure(double) to
1. of the methodpublic double fMeasure(double beta)
Fβ value for
the specified β.beta - The β parameter.Fβ value.public double jaccardCoefficient()
public double chiSquared()
public double phiSquared()
public double yulesQ()
public double yulesY()
public double fowlkesMallows()
public double accuracyDeviation()
public double randomAccuracy()
public double randomAccuracyUnbiased()
public double kappa()
public double kappaUnbiased()
public double kappaNoPrevalence()
public String toString()
public static double fMeasure(double beta,
double recall,
double precision)
beta - Relative weighting of precision.recall - Recall value.precision - Precision value.Copyright © 2016 Alias-i, Inc.. All rights reserved.