Sabotage, or Merely the Worst Datacenter Incompetence in History? You Decide.

The T-Mobile Sidekick kick-in-the-teeth is worse than we thought

3 min read
Sabotage, or Merely the Worst Datacenter Incompetence in History? You Decide.

As the Microsoft/Danger/T-Mobile data disaster drags into another nightmare week for hapless owners of the Sidekick smartphone, more details have emerged. A remarkably well-informed if still speculative article by Daniel Eran Dilger blames the debacle on "dogfooding and sabotage."

"Dogfooding" is the policy of software companies to use their own products whenever possible — and sometimes even when it isn't:

Microsoft is well known for wanting to replace competitor’s technologies with its own. The company famously failed to do this after buying up HoTMaiL in 1996 and attempting to replace its Sun Solaris servers with PCs running NT; it similarly failed to smoothly transition WebTV from its original Sun-infrastructure to one based on Windows Server and WinCE clients in the late 90s. Microsoft also struggled to help Dell replace its WebObjects-based web store after Apple bought NeXT in 1997.

Striving to rid the company of foreign technology and ”eat one’s own dog food“ instead is so common that Microsoft’s employees are said to commonly use the word ”dogfooding“ as a verb to describe this.

That Microsoft, which seems to have a toe dipped into every information technology lake in the world, would both insist on dogfooding and fail spectacularly at it requires no great stretch of the imagination. The sabotage theory requires a greater suspension of disbelief, and a little background. Conveniently, the background includes an explanation of the otherwise bizarre and puzzling fact that "T-Mobile has been warning its Sidekick customers 'during this service disruption, please DO NOT remove your battery, reset your Sidekick, or allow it to lose power.'"

The reason for this relates to how the Sidekick interacts with the Danger cloud services Microsoft was running.

”On the iPhone, you sync your data with your PC/Mac via iTunes, and MobileMe in parallel syncs both the iPhone and the PC/Mac with ‘the cloud“ [at MobileMe]. If the cloud were to go down and everything lost (like I said, an almost completely inconceivable occurrence except by deliberate sabotage), your data would still be preserved on both your iPhone and your PC/Mac,” a source explained.

“Unfortunately, it doesn’t work that way on the Sidekick. The Sidekick was designed under the assumption that the cloud would always be available, and that your data would be safe there, so the device doesn’t try very hard to preserve your data if you were to yank the battery or in the rare event of a phone OS crash/reboot. Instead, under these circumstances the device starts from an empty database and then reloads all of your data from the service when it comes back up.

. . .

“What makes things even worse is that there’s no way to sync your personal data directly to your PC. T-Mobile provides for a small fee a third-party app download to sync your data with Outlook on a Windows PC (and there was a similar app for Mac at one point, but it was discontinued some time ago, pre-acquisition), but I don’t think it syncs email messages, I know it doesn’t sync SMS [messages], and what’s worse is that it syncs from the cloud to the PC, not from the device to the PC.

”Normally that’s an advantage because you don’t need any sort of sync cable, but in this case, with the service down and unlikely to come back up, there’s now no way to transfer any of your data, except by saving your contacts and SMS messages to the SIM card (which has a very limited number of slots available, compared to the device), or by manually writing everything you want to preserve down on paper.

“So this is a catastrophic failure of the worst possible kind. Like I said, I can’t think of any innocent explanation for all user data to have been lost permanently, and for the service to still be down.”

It's the idea of a " catastrophic failure of the worst possible kind" that leads to the same sort of thinking that powers all conspiracy theories:

the fact that no data could be recovered after the problem erupted at the beginning of October suggests that the outage and the inability to recover any backups were the result of intentional sabotage by a disgruntled employee. In any other circumstance, Microsoft or T-Mobile would likely have come forward with an explanation of the mitigating circumstances, blaming bad hardware, a power failure, or some freak accident.

An act of sabotage ”would explain why neither party is releasing any more details: for legal reasons dealing with the ongoing investigation to find the culprit(s),“ one of the sources said. Due to the way Sidekick clients interact with the service, any normal failure should have resulted in only a brief outage until a replacement server could be brought up.

Paranoid, perhaps, but not entirely implausible. Occam's Razor, though, still slices toward plain old incompetence.

In any event, Dilger's tour-de-force does more than just explain how Microsoft's storage area network failed and how T-Mobile, and its million-plus Sidekick users, bear all the brunt and none of the blame. It also describes just how Microsoft's Pink development team — charged with the impossible task of coming up with an iPhone killer — did, and didn't, fit into the existing structure of a company that had separate Windows Mobile and Zune development groups as well. All in about 2500 words. Go read it: here's the link again.

The Conversation (0)

Why Functional Programming Should Be the Future of Software Development

It’s hard to learn, but your code will produce fewer nasty surprises

11 min read
A plate of spaghetti made from code
Shira Inbar

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

Keep Reading ↓Show less