I recently reformatted my phone (a statement I wouldn't have expected to make back when I was regularly reformatting my Windows 98 computer), and when I added my second email account to the phone, Picasa Web Albums asked if I wanted to grant access to Gallery. Since I already connected to my personal Gmail account, I have no need to connect my secondary, I denied this request:
However, very shortly after this request, I was asked again the same question:
I got the same request over and over again until I finally allowed it. (In fact, even after I allowed it, I was asked two more times.)
I by no means consider myself a "designer," but having at least done some reading on the subject, I feel like I see terrible design everywhere now. If you're an engineer, you'd do yourself a favor to read Don't Make Me Think, and if you've got plenty of time, The Design of Everyday Things.
Wednesday, March 30, 2016
Thursday, March 17, 2016
Tiny living for a weekend
This last weekend, as my birthday gift from Andrea, she, the boys, and I went to Olympia to stay in the Bayside Bungalow, a tiny house on Airbnb.
We got to the Bungalow late-ish Friday night and settled in.
The tiny house in the daylight |
Jake and Reed were giddy at the prospect of sleeping in the loft |
After that, it was back to the house for some R&R. There's a path down to the Sound, so the four of us walked down there to hang out for a while.
It was a somewhat chilly, quite rainy Saturday, but that didn't stop the boys from trenching a more optimal path for the rainwater to get to the Sound.
Later in the day, we returned to Olympia, played laser tag for the first time (the boys loved it, wanted to do it again, asked for us to buy them laser tag gear so they can play at home), hid up the arcade, then returned to the Bungalow.
Friday, March 4, 2016
Calculating AUC
Note: This is a copy of a Microsoft-internal blog post I made stemming from some work on my team. It differs only in that I am not publishing the actual code we use internally for AUC calculations. This is a rather math-heavy, data science-related post. You have been warned.
AUC refers to the area under the ROC (receiver operating characteristic) curve. According to Wikipedia, the ROC:
is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity … or recall in machine learning. The false-positive rate is also known as the fall-out… The ROC curve is thus the sensitivity as a function of fall-out.
The AUC, therefore, is a metric of predictive classification performance without the use of a threshold. (Or, perhaps more accurately, across all possible thresholds.)
This document discusses how the AUC is calculated but not any of the derivation, proofs, or description of why. For that, see a data scientist.
Input Data
The input to the AUC calculation is a rowset consisting of an actual value representing the ground truth conversion (0 = did not convert, 1 = converted) and a predicted value (a real value indicating our prediction of the probability that subscription would convert). For example:
Actual | Predicted |
1 | 0.32 |
0 | 0.52 |
1 | 0.26 |
1 | 0.86 |
Tied Rank
The tied rank provides an ordered ranking across all values. When a tie exists, all records in the group get the average of the row numbers in the group. Consider this input:
Actual | Predicted |
1 | 0.9 |
0 | 0.1 |
1 | 0.8 |
0 | 0.1 |
1 | 0.7 |
When ranking according to the predicted value, we:
- order by that column,
- assign row numbers (a simple 1-indexed indication of where each value falls in the dataset),
- assign a rank equal to the minimum row number of all common values, and
- assign a tied rank equal to the average of the row numbers with equal values
Actual | Predicted | Row Number | Rank | Tied Rank |
0 | 0.1 | 1 | 1 | 1.5 |
0 | 0.1 | 2 | 1[AS1] | 1.5[AS2] |
1 | 0.7 | 3 | 3[AS3] | 3 |
1 | 0.8 | 4 | 4 | 4 |
1 | 0.9 | 5 | 5 | 5 |
[AS1]Because there are two equal values of 0.1, they get the same rank.
[AS2]The tied rank is the average of the row numbers in this group; ie, (1+2)/2 = 1.5
[AS3]The rank of 0.7 doesn't change because there was a tie in smaller values. The dense rank for this row would be 2 instead of 3; however, we don't use dense rank.
Calculating AUC
The value of the AUC is:
#Positive and #Negative Cases
The simplest values are simply the count of all positive and negative cases in the entire dataset. From the above example, there are three positive cases (ie, where actual==1) and two negative cases (ie, where actual==0).
Sum of Positive Ranks
This is a sum across all positive cases of the tied rank. From this example, keeping only positive cases, we see the following subset of data with tied ranks:
Actual | Predicted | Tied Rank |
1 | 0.7 | 3 |
1 | 0.8 | 4 |
1 | 0.9 | 5 |
The sum of these tied ranks is therefore 3 + 4 + 5 = 12.
Nth Partial Sum
The remaining component is the Nth partial sum of the infinite series 1 + 2 + 3 + ... for all positive cases, or:
In this example, with NumPositives=3, we calculate the Nth partial sum as 3*(3+1)/2 = 3*4/2 = 6..
Final Calculation
Substituting values, the final equation becomes:
This AUC value of 1.0 is the maximum possible AUC, indicating a very robust prediction. The AUC does not tell us what threshold we should use when trying to make a prediction from a score.
Another Example
Using the input data example, we first calculate the tied rank:
Actual | Predicted | Tied Rank |
1 | 0.26 | 1 |
1 | 0.32 | 2 |
0 | 0.52 | 3 |
1 | 0.86 | 4[AS1] |
[AS1]As there are no ties, the tied rank is equal to the rank.
Subscribe to:
Posts (Atom)