Friday, March 4, 2016

Calculating AUC

Note: This is a copy of a Microsoft-internal blog post I made stemming from some work on my team. It differs only in that I am not publishing the actual code we use internally for AUC calculations. This is a rather math-heavy, data science-related post. You have been warned.
AUC refers to the area under the ROC (receiver operating characteristic) curve. According to Wikipedia, the ROC:
​is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity … or recall in machine learning. The false-positive rate is also known as the fall-out… The ROC curve is thus the sensitivity as a function of fall-out. 
The AUC, therefore, is a metric of predictive classification performance without the use of a threshold. (Or, perhaps more accurately, across all possible thresholds.)
This document discusses how the AUC is calculated but not any of the derivation, proofs, or description of why. For that, see a data scientist.

Input Data​

The input to the AUC calculation is a rowset consisting of an actual value representing the ground truth conversion (0 = did not convert, 1 = converted) and a predicted value (a real value indicating our prediction of the probability that subscription would convert). For example:
ActualPredicted
10.32
00.52
10.26
10.86

Tied Rank​​

The tied rank provides an ordered ranking across all values. When a tie exists, all records in the group get the average of the row numbers in the group. Consider this input:
ActualPredicted
10.9
00.1
10.8
00.1
10.7
​ 
When ranking according to the predicted value, we:
  1. order by that column,
  2. assign row numbers (a simple 1-indexed indication of where each value falls in the dataset),
  3. assign a rank equal to the minimum row number of all common values, and
  4. assign a tied rank equal to the average of the row numbers with equal values

ActualPredictedRow NumberRankTied Rank
00.1111.5
00.121[AS1] 1.5[AS2] 
10.733[AS3] 3
10.8444
10.9555


 [AS1]Because there are two equal values of 0.1, they get the same rank.
 [AS2]The tied rank is the average of the row numbers in this group; ie, (1+2)/2 = 1.5
 [AS3]The rank of 0.7 doesn't change because there was a tie in smaller values. The dense rank for this row would be 2 instead of 3; however, we don't use dense rank.​

Calculati​​ng AUC

The value of the AUC is:

#Positive and #Negative Cases

The simplest values are simply the count of all positive and negative cases in the entire dataset. From the above example, there are three positive cases (ie, where actual==1) and two negative cases (ie, where actual==0).

Sum of Positive Ranks

This is a sum across all positive cases of the tied rank. From this example, keeping only positive cases, we see the following subset of data with tied ranks:
ActualPredictedTied Rank
10.73
10.84
10.95

The sum of these tied ranks is therefore 3 + 4 + 5 = 12.​

Nth Partial Sum

The remaining component is the Nth partial sum of the infinite series 1 + 2 + 3 + ... for all positive cases, or:

In this example, with NumPositives=3, we calculate the Nth partial sum as 3*(3+1)/2 = 3*4/2 = 6..

Final Calculation

Substituting values, the final equation becomes:

This AUC value of 1.0 is the maximum possible AUC, indicating a very robust prediction. The AUC does not tell us what threshold we should use when trying to make a prediction from a score.

Another Example

Using the input data example, we first calculate the tied rank:
ActualPredictedTied Rank
10.261
10.322
00.523
10.864[AS1] 

 [AS1]As there are no ties, the tied rank is equal to the rank.