Wednesday, March 30, 2016

The unfortunate design of not saving preferences

I recently reformatted my phone (a statement I wouldn't have expected to make back when I was regularly reformatting my Windows 98 computer), and when I added my second email account to the phone, Picasa Web Albums asked if I wanted to grant access to Gallery. Since I already connected to my personal Gmail account, I have no need to connect my secondary, I denied this request:


However, very shortly after this request, I was asked again the same question:


I got the same request over and over again until I finally allowed it. (In fact, even after I allowed it, I was asked two more times.)

I by no means consider myself a "designer," but having at least done some reading on the subject, I feel like I see terrible design everywhere now. If you're an engineer, you'd do yourself a favor to read Don't Make Me Think, and if you've got plenty of time, The Design of Everyday Things.

Thursday, March 17, 2016

Tiny living for a weekend

This last weekend, as my birthday gift from Andrea, she, the boys, and I went to Olympia to stay in the Bayside Bungalow, a tiny house on Airbnb.

We got to the Bungalow late-ish Friday night and settled in.

The tiny house in the daylight

Jake and Reed were giddy at the prospect of sleeping in the loft
 Saturday included a trip to Olympia for breakfast, some grocery shopping, and a trip to a park for some underdogs and checking out the jellyfish in the Sound.



After that, it was back to the house for some R&R. There's a path down to the Sound, so the four of us walked down there to hang out for a while.



It was a somewhat chilly, quite rainy Saturday, but that didn't stop the boys from trenching a more optimal path for the rainwater to get to the Sound.


Later in the day, we returned to Olympia, played laser tag for the first time (the boys loved it, wanted to do it again, asked for us to buy them laser tag gear so they can play at home), hid up the arcade, then returned to the Bungalow.


Friday, March 4, 2016

Calculating AUC

Note: This is a copy of a Microsoft-internal blog post I made stemming from some work on my team. It differs only in that I am not publishing the actual code we use internally for AUC calculations. This is a rather math-heavy, data science-related post. You have been warned.
AUC refers to the area under the ROC (receiver operating characteristic) curve. According to Wikipedia, the ROC:
​is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity … or recall in machine learning. The false-positive rate is also known as the fall-out… The ROC curve is thus the sensitivity as a function of fall-out. 
The AUC, therefore, is a metric of predictive classification performance without the use of a threshold. (Or, perhaps more accurately, across all possible thresholds.)
This document discusses how the AUC is calculated but not any of the derivation, proofs, or description of why. For that, see a data scientist.

Input Data​

The input to the AUC calculation is a rowset consisting of an actual value representing the ground truth conversion (0 = did not convert, 1 = converted) and a predicted value (a real value indicating our prediction of the probability that subscription would convert). For example:
ActualPredicted
10.32
00.52
10.26
10.86

Tied Rank​​

The tied rank provides an ordered ranking across all values. When a tie exists, all records in the group get the average of the row numbers in the group. Consider this input:
ActualPredicted
10.9
00.1
10.8
00.1
10.7
​ 
When ranking according to the predicted value, we:
  1. order by that column,
  2. assign row numbers (a simple 1-indexed indication of where each value falls in the dataset),
  3. assign a rank equal to the minimum row number of all common values, and
  4. assign a tied rank equal to the average of the row numbers with equal values

ActualPredictedRow NumberRankTied Rank
00.1111.5
00.121[AS1] 1.5[AS2] 
10.733[AS3] 3
10.8444
10.9555


 [AS1]Because there are two equal values of 0.1, they get the same rank.
 [AS2]The tied rank is the average of the row numbers in this group; ie, (1+2)/2 = 1.5
 [AS3]The rank of 0.7 doesn't change because there was a tie in smaller values. The dense rank for this row would be 2 instead of 3; however, we don't use dense rank.​

Calculati​​ng AUC

The value of the AUC is:

#Positive and #Negative Cases

The simplest values are simply the count of all positive and negative cases in the entire dataset. From the above example, there are three positive cases (ie, where actual==1) and two negative cases (ie, where actual==0).

Sum of Positive Ranks

This is a sum across all positive cases of the tied rank. From this example, keeping only positive cases, we see the following subset of data with tied ranks:
ActualPredictedTied Rank
10.73
10.84
10.95

The sum of these tied ranks is therefore 3 + 4 + 5 = 12.​

Nth Partial Sum

The remaining component is the Nth partial sum of the infinite series 1 + 2 + 3 + ... for all positive cases, or:

In this example, with NumPositives=3, we calculate the Nth partial sum as 3*(3+1)/2 = 3*4/2 = 6..

Final Calculation

Substituting values, the final equation becomes:

This AUC value of 1.0 is the maximum possible AUC, indicating a very robust prediction. The AUC does not tell us what threshold we should use when trying to make a prediction from a score.

Another Example

Using the input data example, we first calculate the tied rank:
ActualPredictedTied Rank
10.261
10.322
00.523
10.864[AS1] 

 [AS1]As there are no ties, the tied rank is equal to the rank.