public class ScoredPrecisionRecallEvaluation extends Object
ScoredPrecisionRecallEvaluation provides an
evaluation based on the precision-recall operating points and
sensitivity-specificity operating points. The unscored
precision-recall evaluation class is PrecisionRecallEvaluation.
There is a single no-arg constructor ScoredPrecisionRecallEvaluation().
The method addCase(boolean,double) is used to populate
the evaluation, with the first argument representing whether the
response was correct and the second the score that was assigned.
If there are positive reference cases that are not added through
addCase(), the total number of such cases should be added
using the method addMisses(int). This method effectively
increments the number of reference positive cases used to compute
recall values.
If there are negative reference cases that are not dealt with
through addCase(), the method addNegativeMisses(int) should be called with the total number of
such cases as an argument. This method increments the number of
reference engative cases used to compute specificity values.
By way of example, consider the following table of cases, all of which involve positive responses. The cases are in rank order, but may be added in any order.
The first line, which is separated, indicates the values before any results have been returned. There's no score corresponding to this operating point, and given that it doesn't correspond to a result, correctness is not applicable. It has zero recall, one specificity, and one precision (letting zero divided by zero be one here).
Rank Score Correct TP TN FP FN Rec Prec Spec F Meas (-1) n/a n/a 0 6 0 5 0.00 1.00 1.00 0.00 0 -1.21 no 0 5 1 5 0.00 0.00 0.83 0.00 1 -1.27 yes 1 5 1 4 0.20 0.50 0.83 0.29 2 -1.39 no 1 4 2 4 0.20 0.33 0.67 0.25 3 -1.47 yes 2 4 2 3 0.40 0.50 0.67 0.44 4 -1.60 yes 3 4 2 2 0.60 0.60 0.67 0.60 5 -1.65 no 3 3 3 2 0.60 0.50 0.50 0.55 6 -1.79 no 3 2 4 2 0.60 0.43 0.33 0.50 7 -1.80 no 3 1 5 2 0.60 0.38 0.17 0.47 8 -2.01 yes 4 1 5 1 0.80 0.44 0.17 0.53 9 -3.70 no 4 0 6 1 0.80 0.40 0.00 0.53 ? n/a yes 5 0 6 0 1.00 0.00 0.00 0.00
The next lines, listed as ranks 0 to 9, correspond to calls to
addCase() with the specified score and correctness. For
each of these lines, we list the corresponding number of true
positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN). These are followed by recall, precision and
specificity (aka rejection recall). See the class documentation
for PrecisionRecallEvaluation for definitions of these
values in terms of the TP, TN, FP, and FN counts.
There are five positive reference cases (blue backgrounds) and six negative reference cases (clear backgrounds) in this diagram. The yellow precision values and orange specificity values are used for interpolated curves.
The pairs of precision/recall values form the basis for the
precision-recall curve returned by prCurve(boolean), with
the argument indicating whether to perform precision interpolation.
For the above graph, the uninterpolated precision-recall curve is:
prCurve(false) = {
{ 0.00, 1.00 },
{ 0.20, 0.50 },
{ 0.20, 0.33 },
{ 0.40, 0.50 },
{ 0.60, 0.60 },
{ 0.60, 0.50 },
{ 0.60, 0.43 },
{ 0.60, 0.38 },
{ 0.60, 0.38 },
{ 0.80, 0.44 },
{ 0.80, 0.40 },
{ 1.00, 0.00 }
}
Typically, a form of interpolation is performed that sets the
precision for a given recall value to the maximum of the precision
at the curent or greater recall value. This pushes the yellow
precision values up the graph. At the same time, we only return
values that correspond to jumps in recall, corresponding to ranks
at which true positives were returned. For the example above,
the result is
prCurve(true) = {
{ 0.00, 1.00 },
{ 0.20, 0.60 },
{ 0.40, 0.60 },
{ 0.60, 0.60 },
{ 0.80, 0.44 },
{ 1.00, 0.00 }
}
For convenience, the evaluation always adds the two limit points,
one with precision 0 and recall 1, and one with precision 1 and
recall 0. These operating points are always achievable, the first
by returning every possible answer, and the second by returning no
answers.
The ROC curve is returned by the method rocCurve(boolean) with the boolean parameter again indicating
whether to perform precision interpolation. For the above graph,
the result is:
rocCurve(false) = {
{ 1 - 1.00, 0.00 },
{ 1 - 0.83, 0.00 },
{ 1 - 0.83, 0.20 },
{ 1 - 0.67, 0.20 },
{ 1 - 0.67, 0.40 },
{ 1 - 0.67, 0.60 },
{ 1 - 0.50, 0.60 },
{ 1 - 0.33, 0.60 },
{ 1 - 0.17, 0.60 },
{ 1 - 0.17, 0.80 },
{ 1 - 0.00, 0.80 },
{ 1 - 0.00, 1.00 }
}
Interpolation works exactly the same way as for the precision-recall
curves, but based on specificity rather than precsion.
rocCurve(true) = {
{ 1 - 1.00, 0.00 },
{ 1 - 0.83, 0.20 },
{ 1 - 0.67, 0.60 },
{ 1 - 0.50, 0.60 },
{ 1 - 0.33, 0.60 },
{ 1 - 0.17, 0.80 },
{ 1 - 0.00, 1.00 }
}
In some information extraction or retrieval tasks, a system might only return a fixed number of examples to a user. To evaluate the result of such truncated result sets, it is common to report the precision after N returned results. The counting starts from one rather than zero for returned results, but we fill in a limiting value of 1.0 for precision at 0. In our running example, we have
precisionAt(0) = 1.0
precisionAt(1) = 0.0
precisionAt(5) = 0.6
precisionAt(10) = 0.4
precisionAt(20) = 0.2
precisionAt(100) = 0.04
The return value for a rank greater than the number of cases added
will be calculated assuming all other results are errors.
1/M, where M is the
rank (counting from 1) of the first true positive return. In our
running example, the first result is a false positive and the
second a true positive, so reciprocal rank is
reciprocalRank()() = 0.5
Note that this measure emphasizes differences in early ranks
much more than later ones. For instance, the reciprocal rank
for a system returning a correct result first is 1/1, but
for one returning it second, it's 1/2, and for one returning
the first true positive at rank 10, it's 1/10. The difference
between rank 1 and 2 is greater than that between 2 and 10.
For the running example, R precision is
R precision will always be at a point where precision equals recall. It is also known as the precision-recall break-even point (BEP), and for convenience, there is a method of that name,rPrecision() = 0.6
prBreakevenPoint() = rPrecision() = 0.6
PrecisionRecallEvaluation.fMeasure(double,double,double) for a
definition of F measure). The result is the maximum F measure
value achieved at any position on the curve. For our example, this
arises at
maximumFMeasure() = 0.6
In general, the maximum F measure may occur at a point other than the precision-recall break-even point.
The average across multiple evaluations of average precision is somewhat misleadingly called mean average precision (MAP) [it should be average average precision, because averages are over finite samples and means are properties of distributions].
The eleven-point precision-recall curves, reciprocal rank, and R precision are also popular targets for reporting averaged results.
| Constructor and Description |
|---|
ScoredPrecisionRecallEvaluation()
Construct a scored precision-recall evaluation.
|
| Modifier and Type | Method and Description |
|---|---|
void |
addCase(boolean correct,
double score)
Add a case with the specified correctness and response score.
|
void |
addMisses(int count)
Incrments the positive reference count without adding a
case from the classifier.
|
void |
addNegativeMisses(int count)
Incrments the negative reference count without adding a case
from the classifier.
|
double |
areaUnderPrCurve(boolean interpolate)
Returns the area under the curve (AUC) for the recall-precision
curve with interpolation as specified.
|
double |
areaUnderRocCurve(boolean interpolate)
Returns the area under the receiver operating characteristic
(ROC) curve.
|
double |
averagePrecision()
Returns the average of precisions at the true positive
results.
|
double[] |
elevenPtInterpPrecision()
Returns the interpolated precision at eleven recall points
evenly spaced between 0 and 1.
|
double |
maximumFMeasure()
Returns the maximum F1-measure for an
operating point on the PR curve.
|
double |
maximumFMeasure(double beta)
Returns the maximum Fβ-measure for
an operating point on the precision-recall curve for a
specified precision weight
β > 0. |
int |
numCases()
Returns the total number of positive and negative reference
cases for this evaluation.
|
int |
numNegativeRef()
Return the number of negative reference cases.
|
int |
numPositiveRef()
Returns the number of positive reference cases.
|
double |
prBreakevenPoint() |
double[][] |
prCurve(boolean interpolate)
Returns the precision-recall curve, interpolating if
the specified flag is true.
|
double |
precisionAt(int rank)
Returns the precision score achieved by returning the top
scoring documents up to (but not including) the specified rank.
|
static void |
printPrecisionRecallCurve(double[][] prCurve,
PrintWriter pw)
Prints a precision-recall curve with F-measures.
|
static void |
printScorePrecisionRecallCurve(double[][] prScoreCurve,
PrintWriter pw)
Prints a precision-recall curve with score.
|
double[][] |
prScoreCurve(boolean interpolate)
Returns the array of recall/precision/score operating points
according to the scores of the cases.
|
double |
reciprocalRank()
Returns the reciprocal rank for this evaluation.
|
double[][] |
rocCurve(boolean interpolate)
Returns the receiver operating characteristic (ROC) curve for
the cases ordered by score, interpolating if the specified flag
is
true. |
double |
rPrecision()
Return the R precision.
|
String |
toString()
Returns a string-based representation of this scored precision
recall evaluation.
|
public ScoredPrecisionRecallEvaluation()
public void addCase(boolean correct,
double score)
true if the reference was also
positive. The score is just the response score.
Warning: The scores should be sensibly comparable across cases.
correct - true if this case was correct.score - Score of response.public void addMisses(int count)
count - Number of outright misses to add to
this evaluation.IllegalArgumentException - if the count is not positive.public void addNegativeMisses(int count)
count - Number of outright misses to add to
this evaluation.IllegalArgumentException - if the count is not positive.public int numCases()
numPositiveRef() and #numNegativeRef().public int numPositiveRef()
true plus
the number of misses added.public int numNegativeRef()
false plus
the number of negative misses added.public double rPrecision()
public double[] elevenPtInterpPrecision()
public double averagePrecision()
#addMisses(int), the precision is considered
to be zero. (See class documentation for more information.)public double[][] prCurve(boolean interpolate)
Warning: Despite the name, the values returned are in the arrays with recall at index 0 and precision at index 1.
interpolate - Set to true for precision
interpolation.public double[][] prScoreCurve(boolean interpolate)
prCurve(boolean).
Index 0 is recall, 1 is precision and 2 is the score.
.
interpolate - Set to true if the precisions
are interpolated through pruning dominated points.public double[][] rocCurve(boolean interpolate)
true. See the class documentation above for
a definition and example of the returned curve.interpolate - Interpolate specificity values.public double maximumFMeasure()
public double maximumFMeasure(double beta)
β > 0.public double precisionAt(int rank)
Double.NaN is
returned.public double prBreakevenPoint()
public double reciprocalRank()
1/N of the
rank N at which the first true positive is found.
This method counts ranks from 1 rather than 0.
The return result will be between 1.0 for the first-best result
being correct and 0.0, for none of the results being correct.public double areaUnderPrCurve(boolean interpolate)
Warning: This method uses the parallelogram method for interpolation rather than the usual interpolation method used to calculate AUC for precision-recall in information retrieval evaluations. The usual AUC calculation for PR curves
interpolate - Set to true to interpolate
the precision values.public double areaUnderRocCurve(boolean interpolate)
interpolate - Set to true to interpolate
the rejection recall values.public String toString()
public static void printPrecisionRecallCurve(double[][] prCurve,
PrintWriter pw)
prCurve(boolean): an array of length-2 arrays of doubles.
In each length-2 array, the recall value is at index 0, and the precision
is at index 1. The printed curve prints 3 columns in the following order:
precision, recall, F-measure.prCurve - A precision-recall curve.pw - The output PrintWriter.public static void printScorePrecisionRecallCurve(double[][] prScoreCurve,
PrintWriter pw)
prScoreCurve(boolean): an array of length-3 arrays of doubles.
In each length-3 array, the recall value is at index 0, and the precision
is at index 1 and score at 2. The printed curve prints 3 columns in the following order:
precision, recall, score.prScoreCurve - A precision-recall score curve.pw - The output PrintWriter.Copyright © 2016 Alias-i, Inc.. All rights reserved.