On Leaving IBM

Posted by Jack on 2017-01-26 at 13:34
Tagged: work

After working with IBM for over 8 years, this week marks my last with Big Blue. Next week, I'll start my new job with AMD.

Since I've been spending a lot of time over the last month or so reflecting on my time at IBM, I figured I could use this post to collect some thoughts on why I'm leaving.

Why I'm Leaving IBM

My biggest reason for leaving IBM is that I've grown weary of being isolated from the people that are important to my work. I've worked tangentially with people across the US, Germany, Brazil, Ireland and India, but IBM's Linux and free software expertise is focused in Australia, namely OzLabs.

Since its acquisition in 2001, OzLabs has been the center of PowerPC Linux. Now, with OpenPower (the leaner, meaner PowerPC), OzLabs controls the Linux port, most of the firmware, and the main bootloader for the platform. In other words, every project in IBM that I was interested in or worked on.

This, in and of itself, was no problem. OzLabs is transparent and fanatically open source. Even non-Linux development is done on public mailing lists with a bevy of git trees. It's easy to observe or even participate if you've got changes in mind.

My problem was that working with OzLabs was simultaneously unavoidable with my interests in IBM, and really really difficult thanks to my being 8000 miles away and outnumbered 10 to 1. Over the course of my IBM career I attempted to bridge that distance, but in the end I found that I need what OzLabs already enjoys and nowhere else in IBM could provide me - a critical mass of local developers to work with.

Let me provide a little context for how this happened, and how I reached that conclusion.

Early History

I spent the first half of my time in IBM (2008-2011) doing the best that I could supporting embedded PowerPC from here in Austin. I learned the architecture from the point of view of weird devices like Cell, Prism (WSP), Chroma (a PCIE card variant of Prism) and Espresso but aside from a brief couple of weeks in 2010 doing Prism bring up with OzLabbers in Raleigh none of this gave me any opportunity to actually learn to be a kernel hacker.

That's a pretty loaded statement, so let me clarify. I had plenty of opportunities to read and write Linux kernel code. What I didn't have was any experience actually finding work before coding, or getting work included in Linux after coding. Both of these tasks are critically important, but in those early days work was dropped into my lap and, when I thought I was done, it was handed off to someone else to upstream, or it wasn't upstreamed at all. For example, the small amount of Prism work I did ended up in Linux years later without my involvement and was subsequently stripped out of Linux without my involvement either.

When there's a lot of work to do and it's easy to come by, this arrangement isn't so bad. At the time my tasks almost universally came from people in Austin, were intended to support people in Austin, and in return I was supported by people in Austin. Even though OzLabs was still the center of PowerPC Linux and my early code was reviewed and submitted by OzLabs, when I needed help or new work I didn't call them, I talked to people that were a few doors down from my office in Austin.

Unfortunately, embedded PowerPC dried up after Espresso, and that all changed.

BML

Remembering my wonderful experience bringing up Prism and Chroma in 2010, I joined the Bare Metal Linux team with hopes that their reputation for hardcore bring up and CPU enablement work would directly translate into Linux commits.

Circa 2012, the BML team was small and mostly local to Austin, although still led by an OzLabber. POWER 7 was still pretty new and POWER 8 was waiting in the wings.

My first task was actually testing and writing proof of concept support for a new P8 feature called "accelerated switchboard" that included a new instruction PBT (Push Block to Thread). I felt like I was on the right track, I had a new chip feature to bang on and I was in the simulation and lab environment for the main POWER line of server processors instead of the weird embedded devices I cut my teeth on. I was even excited that my team was directly connected to OzLabs.

BML's reputation was an old one though, and it was earned when getting Linux to run on a new chip required a lot of hypervisor support that didn't exist when your chip was a simulation or a prototype fresh out of the fab. Linux without a hypervisor (and potentially running on glacially slow hardware simulators) required a lot of custom patches, custom stub firmware, on top of a set of lab tools to actually generate a device tree and load various artifacts into memory on a variety of platforms.

When I joined BML, the codebase was geared around this level of deep involvement. It had a directory full of bit-rotting Linux patches, a rickety build system based on snapshots instead of git trees, as well as a pile of awful Perl hacks used to interface with lab infrastructure. That may sound harsh, but I doubt any of my former teammates would disagree. It was clear that BML evolved from a minor miracle into a useful lab debug tool rather than being designed with that goal in mind.

Meanwhile, back in OzLabs, work was being done to effectively turn hypervisor-less Linux on Power into a fully supported platform (OpenPower). This wasn't a bad thing. In fact, from my perspective, OpenPower is not only a great step for PowerPC but also the ultimate validation of BML's purpose. However, as OpenPower started to gain traction, "bare metal Linux" went from a complex feat of hackery that could justify four or five hardcore engineers working in concert, to being a thin layer of scripts around a platform that was being professionally supported by Australia.

This shrinkage explains two things that I didn't understand until later.

First, why I never actually worked with my teammates on anything long term. Almost all of us were being loaned out to other projects because there wasn't enough work to do on BML itself to require the headcount. We were all working on line items that were, at best, tangentially related to BML. Writing drivers, working with research, debugging simulators, integrating the lab with system testing and so on. Having to stop and hack on the BML lab infrastructure we were technically attached to was often an annoying and inconvenient distraction from the other tasks on which we had been focused.

Second, how in that same time period I ended up with exactly zero Linux patches. BML itself didn't require kernel support anymore (beyond providing builds) thanks to OpenPower. Even the CPU enablement stuff I did get was either dropped from the final release (AS), taken out of my hands (64 bit decrementer, although that was because I was loaned out again), or both (load monitor).

BML was less a team and more a holding zone for kernel level engineers to be assigned wherever needed. I get how this had utility for IBM, and maybe even OzLabs sometimes, but it was a real shit situation to be in for someone like me. In my perspective, bouncing from project to project was just a way to never make a significant difference anywhere. To make matters worse, it was apparent that a lot of interesting work was being done during my time in BML, work that I desperately wanted to do, but had been absorbed by OzLabs while I was unwittingly wasting my time doing shit like implementing merge sorts on FPGA devices or trying to debug ancient Perl scripts.

When I finally looked up long enough to get some context, I felt like I was busy scarfing down dog food while OzLabs was just finishing up the filet mignon. By the time I got a crack at their leftovers the competition for that work was fierce and still dominated by other OzLabbers that could divvy up work and help each other while chatting over coffee instead of contending with a 17 hour time difference and tedious emails.

This is why I wish that I had learned to work more closely with OzLabs, or work more independently earlier in my career when it wasn't mandatory and the stakes were lower. Having done only inconsequential work, I was still an unknown quantity to OzLabs on top of operating with a severe handicap compared to other options. Certainly not the first person you'd think of when you needed work done with a minimum of hassle. Because of this and the fact that I didn't know how to find alternative work myself, I started to feel hopelessly mired in low priority, dead end tasks. Whether that was an accurate perception or not, I viewed my lack of kernel work and the stopgap nature of my BML work as evidence that it was true and lost faith that I would ever accomplish something meaningful.

I became despondent.

It didn't help that while I was on BML there were multiple rounds of layoffs, we were all forced to take a week of furlough (unpaid time off), and my friend and team/officemate Ryan Grimm left Austin so I was telecommuting all of the time. Morale was at rock bottom. I actively pursued other jobs, stopped taking my work seriously, stopped tuning in to weekly Oz synchronization calls in favor of family dinner, the news, or bedtime stories. I spent weeks on my own rewriting BML's infrastructure from scratch in Python for vague reasons that amounted to keeping myself busy beyond whatever nebulous line items I was actually supposed to be pinning down on the budget.

SoftLayer

Underscoring the fact that BML members were almost always focused elsewhere, my entire 2015 was dominated by supporting SoftLayer, a cloud company IBM acquired in 2013 that still ran x86 chips almost exclusively. This had nothing to do with BML, and I was only tapped because I was in the right area (SoftLayer is based in nearby Dallas), had low level experience, and basically nothing else to do.

SoftLayer was a transformative experience for me. On one hand, even in a totally different role, I was still mopping up after OzLabbers that absorbed all of the main firmware/kernel work before I arrived. On the other hand, I was shoulder to shoulder with a small team for the first time since Prism five years earlier and it felt great.

It was our job to convince SoftLayer developers and execs who were openly hostile to our architecture that our systems could hang in their highly automated data centers. It was a tough sell. In fact, the first time I went to Dallas to work with SoftLayer, they were throwing a fit and wouldn't even communicate with IBM enough to sit down to lunch with me until the 4th day when the cavalry had arrived.

Thankfully, things went smoothly afterwards. In the following months I worked closely with my fellow IBMers as well as SoftLayer's team, I had multiple daily phone calls that weren't complete wastes of time, I had VP level visibility in both companies, and daily status notes with the Director of my organization. Most importantly, it was up to me to re-implement all the parts of SoftLayer's infrastructure that were Windows only, culminating in converting roughly 6,000 lines of C++ (with 15,000 more in templates and random dependencies grafted in) into a tight 250 lines of Python that I actually got to demo for execs. I'm not saying I'm a miracle worker, but it felt good to prove under pressure that I was a competent Linux developer that wasn't going to let some "copy and paste from Stack Overflow" Windows types scare me off with a Visual Studio project that looked like byzantine dogshit but was actually just implementing a simple (idiosyncratic, undocumented) JSON API in the wrong language for the job.

For so long I'd felt worthless, doomed to work as a member of a team that was mostly management fiction, cursed with dead end tasks that only landed on me after every OzLabber available had passed on them, or there was nothing left but scutwork. I'd even become aware that fresh hires at OzLabs had become far more productive than I was with years of supposed experience and started to wonder if I just sucked and nobody had the balls to say it to my face.

After SoftLayer it was like waking up from a trance, or fitting the last piece into a puzzle. Something clicked. I no longer felt despondent, I felt confident. I realized that those fresh hires weren't prodigies or supermen, they were benefiting from the same sort of close quarters team collaboration and effortless communication that I hadn't felt in the years between Prism and SoftLayer.

It was at this point that I developed a creeping suspicion that my time at IBM was drawing to a close.

Pulling the Trigger

I returned to BML early in 2016 and attempted to keep working.

Sure enough I was assigned some P9 kernel work, and sure enough as soon as there was an issue with it, the patch was no longer mine because it was easier to fix ASAP than it was to spend another 24 hours sending messages back and forth. To add further frustration, that code also got reverted when support was dropped from the chip so even if I hadn't screwed up I still wouldn't have had a patch in anyway.

I became confrontational. The next time there was a thread about the future of the team, I let everyone know what I thought about the current state of the team, how the project was being obsoleted by OpenPower, how its remaining functionality didn't take nearly as many developers, and how BML should either be refocused on my rewritten version or destroyed. I kept it professional, but I assume I still came off heated. Regardless, nobody gave a shit about my opinions. In retrospect I think I was expressing my frustration with the team more than I was expecting anything to change, but it would have been nice if someone had stepped up to defend BML from my criticisms even if my vision for the future of the team was weak.

I left BML shortly thereafter and was placed on the OpenPower team. Once again led by an OzLabber, isolated from everyone making decisions, bouncing between projects. Not having the BML infrastructure to worry about was a step in the right direction, and my team lead did the best he could to keep me busy, but at this point I was longing for a local team with a project where I could be self-driven instead of relying on others to mete out my weekly portion.

Ironically, this is where, after eight years, I finally got my one, single, solitary upstream Linux commit that was actually my own work, even if it was only a simple cleanup.

Anyway, 2016 was just a long series of confirmations that, barring a relocation to Canberra, IBM didn't have the capability to provide what I wanted in an area I was interested in.

Blame, or lack thereof

So who do I ultimately blame for leaving? Well... nobody.

Despite my feeling overshadowed, OzLabbers didn't do anything to wrong me. On the contrary, I learned practically everything I know about writing good C and assembly from reading theirs. They had never been anything but friendly to me and understanding when I fucked up. What should they have done differently to keep me at IBM? Not revitalized the platform so I wouldn't be envious of their good work? Should they have ignored local talent to give me an even playing field? Should they have given me bigger chunks of work when I never proved I could handle the smaller ones? No, of course not.

I do wish that I had the opportunity to visit OzLabs to learn their workflow in their natural habitat. I think I would have made a better impression trying to work with them, rather than only meeting in Austin when they were dashing between meetings instead of kernel hacking. Perhaps then I would have gained insight on how to better work with them from Austin, or work more effectively on my own but I'm doubtful it would have mattered.

As for IBM overall, I can't say there was much they could do either, short of keeping embedded PowerPC going with an Austin team.

For myself, I know I could have done better. There are other foreign developers that work just fine kernel hacking PowerPC, and maybe with a bit more skill, patience, experience, or even just flexibility to work on something outside of the base chip I could've been one of them. I certainly could have been more aware of what was going on, and less prone to spells of depression. I could have been more communicative, or more pro-active.

In my defense, most of the time I did the best I could with the information I had. Yes, I made dumb choices, I made naive mistakes, but a lot of this is only clear to me now with years of hindsight. It should be no surprise that 31-year-old-seasoned-programmer me would do 100x better than 22-year-old-college-grad me given the chance, so I try not to dwell on my failures after I've learned from them.

The bottom line is that as I matured as a programmer and employee, what I wanted from my team and employer changed drastically enough that I couldn't be accommodated. There is no fault, just greater understanding, so while I very well could be making a huge mistake, I'm reminded of this recent xkcd:

xkcd: Settling

Some shout outs

At this point, all I can say is that I firmly believe OzLabs is filled with the best, most professional and accomplished engineers I've ever worked with. OpenPower is a huge leap for PowerPC and a major achievement for IBM's server business. As it has been for more than 15 years, the architecture is in good Australian hands.

I also want to bid my North American BML peeps and SoftLayer invasion force brothers in arms farewell.

Oh and, hopefully for the last time, I just want to say "Fuck Lotus Notes."