Tuesday, April 5, 2016

In which I complain about being a mid-30s professional transplant

I've got a pretty damn awesome life. If I had been born in the paleolithic era, on average, I'd be dead by now. (Average life expectancy: 33 years. Also, 100% of people from the paleolithic are now dead.) Or in recent time, only 150 years ago, if I were British, I'd probably be a lovable chimney sweep. Or if I was born in, say, the DRC, I might have ended up a child soldier. Who knows?

Instead, I grew up in rural America, have a life expectancy of ~73 years (if I don't screw it up), took a bunch of math and science classes, and got myself a shiny bachelor's degree in computer science, which must put me right up there with four of the top ten wealthiest people in the world, right?

Actually, despite the grief my housemates in college gave me when I switched from Computer Engineering to Computer Science (personal hygiene and social ineptitude jokes abound), that was an awesome choice. Not only do I think I'm much happier swimming in code than I would have been in Fourier transforms (I still don't know what the hell they are), it turns out software engineering is a pretty damn lucrative industry.

This is how computer science and biology majors roll

The US Census Bureau's stats bode very well for me. And for that, I'm very thankful. Andrea and I are incredibly, ridiculously fortunate to have the financial security we do. (Despite taking a huge loss on the sale of our 2004-purchased home, we're very comfortable.)

On top of that, the Pacific Northwest is killer. We've had a hell of a ride in the last almost four years that we've lived here. As terrifying as it was to make the move here, it was one of the best decisions I think we've ever made. Who knew you and your dog could hike up a mountain post-Thanksgiving and have such a gorgeous view? (Also, we have an awesome dog.)

Woof.
And I get to do all sorts of cool stuff like summit Mount Rainier.

Colors!
And then a year later, Mount Shuksan.

Not as high as Rainier, but still pretty awesome.
But there's also a downside to all of it - and here's where I start bitching: it sucks trying to make friends again when you're older.

I'm inclined to think Malcolm Gladwell's infamous 10 000 hour rule applies to friendships, and all those people I used to be so close to are (or were) my friends due in no small part to how much time we spent together in elementary and high school, college classes, and in the dorms.

As it happens, when you're spending 9-10 hours every day at work (plus commute time), you don't have much time to hang out with people for fun. And those folks you used to hang out with all the time are now spread around the country, making it hard to keep up the inertia of existing relationships. The extremely unsatisfying and unsettling heat death of friendships fucking sucks. Pardon mon fran├žais tout le monde.

This is the point at which, as a stereotypical male engineer, I want to dissect the problem and control for various factors, but this post isn't problem-solving but rather complain-into-the-ether -- a problem I, admittedly, am very privileged to have.

Wednesday, March 30, 2016

The unfortunate design of not saving preferences

I recently reformatted my phone (a statement I wouldn't have expected to make back when I was regularly reformatting my Windows 98 computer), and when I added my second email account to the phone, Picasa Web Albums asked if I wanted to grant access to Gallery. Since I already connected to my personal Gmail account, I have no need to connect my secondary, I denied this request:


However, very shortly after this request, I was asked again the same question:


I got the same request over and over again until I finally allowed it. (In fact, even after I allowed it, I was asked two more times.)

I by no means consider myself a "designer," but having at least done some reading on the subject, I feel like I see terrible design everywhere now. If you're an engineer, you'd do yourself a favor to read Don't Make Me Think, and if you've got plenty of time, The Design of Everyday Things.

Thursday, March 17, 2016

Tiny living for a weekend

This last weekend, as my birthday gift from Andrea, she, the boys, and I went to Olympia to stay in the Bayside Bungalow, a tiny house on Airbnb.

We got to the Bungalow late-ish Friday night and settled in.

The tiny house in the daylight

Jake and Reed were giddy at the prospect of sleeping in the loft
 Saturday included a trip to Olympia for breakfast, some grocery shopping, and a trip to a park for some underdogs and checking out the jellyfish in the Sound.



After that, it was back to the house for some R&R. There's a path down to the Sound, so the four of us walked down there to hang out for a while.



It was a somewhat chilly, quite rainy Saturday, but that didn't stop the boys from trenching a more optimal path for the rainwater to get to the Sound.


Later in the day, we returned to Olympia, played laser tag for the first time (the boys loved it, wanted to do it again, asked for us to buy them laser tag gear so they can play at home), hid up the arcade, then returned to the Bungalow.


Friday, March 4, 2016

Calculating AUC

Note: This is a copy of a Microsoft-internal blog post I made stemming from some work on my team. It differs only in that I am not publishing the actual code we use internally for AUC calculations. This is a rather math-heavy, data science-related post. You have been warned.
AUC refers to the area under the ROC (receiver operating characteristic) curve. According to Wikipedia, the ROC:
​is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity … or recall in machine learning. The false-positive rate is also known as the fall-out… The ROC curve is thus the sensitivity as a function of fall-out. 
The AUC, therefore, is a metric of predictive classification performance without the use of a threshold. (Or, perhaps more accurately, across all possible thresholds.)
This document discusses how the AUC is calculated but not any of the derivation, proofs, or description of why. For that, see a data scientist.

Input Data​

The input to the AUC calculation is a rowset consisting of an actual value representing the ground truth conversion (0 = did not convert, 1 = converted) and a predicted value (a real value indicating our prediction of the probability that subscription would convert). For example:
ActualPredicted
10.32
00.52
10.26
10.86

Tied Rank​​

The tied rank provides an ordered ranking across all values. When a tie exists, all records in the group get the average of the row numbers in the group. Consider this input:
ActualPredicted
10.9
00.1
10.8
00.1
10.7
​ 
When ranking according to the predicted value, we:
  1. order by that column,
  2. assign row numbers (a simple 1-indexed indication of where each value falls in the dataset),
  3. assign a rank equal to the minimum row number of all common values, and
  4. assign a tied rank equal to the average of the row numbers with equal values

ActualPredictedRow NumberRankTied Rank
00.1111.5
00.121[AS1] 1.5[AS2] 
10.733[AS3] 3
10.8444
10.9555


 [AS1]Because there are two equal values of 0.1, they get the same rank.
 [AS2]The tied rank is the average of the row numbers in this group; ie, (1+2)/2 = 1.5
 [AS3]The rank of 0.7 doesn't change because there was a tie in smaller values. The dense rank for this row would be 2 instead of 3; however, we don't use dense rank.​

Calculati​​ng AUC

The value of the AUC is:
auc equation.png

#Positive and #Negative Cases

The simplest values are simply the count of all positive and negative cases in the entire dataset. From the above example, there are three positive cases (ie, where actual==1) and two negative cases (ie, where actual==0).

Sum of Positive Ranks

This is a sum across all positive cases of the tied rank. From this example, keeping only positive cases, we see the following subset of data with tied ranks:
ActualPredictedTied Rank
10.73
10.84
10.95

The sum of these tied ranks is therefore 3 + 4 + 5 = 12.​

Nth Partial Sum

The remaining component is the Nth partial sum of the infinite series 1 + 2 + 3 + ... for all positive cases, or:
partial sum.png
In this example, with NumPositives=3, we calculate the Nth partial sum as 3*(3+1)/2 = 3*4/2 = 6..

Final Calculation

Substituting values, the final equation becomes:
final auc example.png
This AUC value of 1.0 is the maximum possible AUC, indicating a very robust prediction. The AUC does not tell us what threshold we should use when trying to make a prediction from a score.

Another Example

Using the input data example, we first calculate the tied rank:
ActualPredictedTied Rank
10.261
10.322
00.523
10.864[AS1] 

 [AS1]As there are no ties, the tied rank is equal to the rank.
auc example 1.png 
auc example 2.png
auc example 3.png
auc example 4.png

Monday, January 18, 2016

Take the train and see America (very slowly)!

I've been meaning to write up a little something about our Christmas travels this last year in which we took Amtrak home instead of our usual flight. Usually we spend something like $400/person for our airfare between SEA & MSP, and I think we've had to pay as much as $500. Turns out a lot of people travel around Christmas, and supply and demand means it costs more to travel at that time of year. Go figure.

On the other hand, I found that one-way airfare (SEA->MSP) was only about $125/person, and a one-way train ticket was $150/adult, $75/child. After taxes and fees, we spent $600 to fly to Minnesota and about $500 to take the train. For all four of us. On the other hand, a return flight would be something like 3.5hrs versus 34hrs for the train, so we're trading a day and a third for $1000ish savings. And the experience of seeing the country.

The St. Cloud Amtrak station is about as big as one airport terminal gate seating area but with no coffee or gift shops. Instead, there was a broken water fountain and a (working!) 7-Up machine from the Taft administration.



We got to the station around 11:30pm after a busy day at my parents' house. Both boys zonked out for a while, though Jacob had been so extremely tired that before falling asleep, he got a severe case of the sillies and was Andrea's and my comic relief for about twenty minutes.




When the train arrived around 12:40am, we carried the boys outside, but they woke up on their own, extremely excited to take the train. Jacob told me, "Dad, it's real! I pinched myself and it hurt!"


Once on the train, we found the seats to be pretty comfortable, the ride smooth, accommodations... acceptable. The boys actually enjoyed it surprisingly well.


Though the ride took a day and a half, we were at least able to get out at several of the stops (though I only did so on one).


Unfortunately, the better part of our daylight on New Years Eve day was through North Dakota and Montana -- all very flat and boring. We did have a nice sunset, however.



By the time we got to Glacier, it was nighttime and nothing to see. We got to Spokane around midnight-ish, and we didn't get any light until around the Cascades. The snow-covered trees and mountains were beautiful, though unfortunately, we were so close that we didn't get any good shots of the scenery.

The western side of the Cascades were a nice welcome home.



The final stretch from Everett to Seattle was right along the Sound and was gorgeous, though I was too busy gawking at it to take any pictures. The arrival at King Street Station was a nice break to stretch and walk around before we hopped into a taxi to get home.

All in all, it was a good experience. I'm glad we did it -- not just because it barely offset the cost of boarding Shasta for our two week trip, but it was also was a fun experience in and of itself. Hard to say, though, whether and when we'll do it again.

Friday, November 27, 2015

A return to Mailbox Peak

After entirely too long, I returned to Mailbox Peak today with my hiking buddy. When I grabbed the leash and other hiking gear, she was elated, happily jumping into the car. The long drive out to and past North Bend was tiring, though.


We took the old trail up, which is a pretty grueling 4000' elevation gain over a mere 2.6 miles. Shasta had no problem whatsoever with the hike.

I, however, got wrecked, having done only two other hikes since my June summit of Mount Shuksan, one of which being last month's trip up Mount Saint Helens. I bonked around 1.5-2.0mi in, allowing Shasta to drag my ass up the rest of the way to the beautiful scenery of the Cascades and Mount Rainier. She was quite happy up there.


But I could barely walk.

I thought the descent might be made better by using the new trail, a 4.7mi descent instead of 2.6. That was all well and good, but it was significantly more snowy and icy than the old trail and took so long that I was stumbling all the way back to the car.

Up until today, I had been very adamant about the rule that dogs must always be on leashes. In fact, when Shasta was walking under a felled log, I didn't even let go of the leash while transferring it. However, having spent a good half hour without seeing another person, I decided to try letting her off-leash to see how she would do.

After maybe five minutes of her sprinting up ahead 50 yards or so then doubling-back to me, lather, rinse, repeat, she finally got into a good routine of staying within maybe 10-20 feet of me. (For what it's worth, I put her back on-leash immediately when we did eventually run into hikers. I then took her back off, and she did phenomenally every time.)

We got home and Shasta climbed onto the couch to curl up and rest. After my supremely weak-sauce performance for the day, so did I. Roughly 5.5hrs for the 7.3mi round-trip plus travel time, it left much to be desired. But it was good to be back on the mountain again.

Thursday, October 29, 2015

Another year at Microsoft, another position change

A little over a year ago, I moved from Bing to DX, part of Microsoft's evangelism group to work on a project that still has not yet been disclosed. Tomorrow will be my last day within DX as I move to the Windows and Devices Group (WDG) to work yet again on some truly big data + machine learning.

In Bing, I worked on a few projects utilizing Microsoft's Cosmos distributed and massively parallel processing system before moving over to do some tented work on Azure and websites and other secret things. Starting next week, I'll once again be doing severely large-scale work in analytics and prediction, with some natural language stuff thrown in for good measure. Or so I'm lead to understand.

The position sort of fell into my lap, so it's with some trepidation that I'm leaving a project where I get to do some cool stuff and implement a lot of neat features -- which, of course, I can't yet discuss. Maybe once my soon-to-be-former team goes public, I'll do some blog posts about what I've been doing.